Schedule pod on specific node without modifying PodSpec - kubernetes

As a k8s cluster administrator, I want to specify on which nodes (using labels) pods will be scheduled, but without modifying any PodSpec section.
So, nodeSelector, affinity and taints can't be options.
Is there any other solution ?
PS: the reason I can't modify the PodSpec is that deployed applications are available as Helm charts and I don't have hand on those files. Moreover, if I change the PodSpec, it will be lost on next release upgrade.

You can use the PodNodeSelector admission controller for this:
This admission controller has the following behavior:
If the Namespace has an annotation with a key scheduler.kubernetes.io/nodeSelector, use its value as the node selector.
If the namespace lacks such an annotation, use the clusterDefaultNodeSelector defined in the PodNodeSelector plugin configuration file as the node selector.
Evaluate the pod’s node selector against the namespace node selector for conflicts. Conflicts result in rejection.
Evaluate the pod’s node selector against the namespace-specific whitelist defined the plugin configuration file. Conflicts result in rejection.
First of all you will need to enable this admission controller. The way to enable it depends on your environment, but it's done via the parameter kube-apiserver --enable-admission-plugins=PodNodeSelector.
Then create a namespace and annotate it with whatever node label you want all Pods in that namespace to have:
kubectl create ns node-selector-test
kubectl annotate ns node-selector-test \
scheduler.alpha.kubernetes.io/node-selector=mynodelabel=mynodelabelvalue
To test it you could do something like this:
kubectl run busybox \
-n node-selector-test -it --restart=Never --attach=false --image=busybox
kubectl get pod busybox -n node-selector-test -o yaml
It should output something like this:
apiVersion: v1
kind: Pod
metadata:
name: busybox
....
spec:
...
nodeSelector:
mynodelabel: mynodelabelvalue
Now, unless that label exists on some nodes, this Pod will never be scheduled, so put this label on a node to see it scheduled:
kubectl label node myfavoritenode mynodelabel=mynodelabelvalue

No, there are no other ways to specify on which nodes pod will be scheduled, only labels and selectors.
I think the problem with a Helm is related to that issue.
For now, the only way for you is to change a spec, remove release and deploy a new one with updated specs.
UPD
#Janos Lenart provides a way how to manage scheduling per-namespace. That is a good idea if your releases already are split among namespaces and if you don't want to spawn different pods on different nodes in a single release. Otherwise, you will have to create new releases in new namespaces and in that case I highly recommend you to use selectors in the Pods spec.

Related

Is kubectl apply safe enough to update all pods no matter how they were created?

A pod can be created by Deployment or ReplicaSet or DaemonSet, if I am updating a pod's container specs, is it OK for me to simply modify the yaml file that created the pod? Would it be erroneous once I have done that?
Brief Question:
Is kubectl apply -f xxx.yml the silver bullet for all pod update?
...if I am updating a pod's container specs, is it OK for me to simply modify the yaml file that created the pod?
The fact that the pod spec is part of the controller spec (eg. deployment, daemonset), to update the container spec you naturally start with the controller spec. Also, a running pod is largely immutable, there isn't much you can change directly unless you do a replace - which is what the controller already doing.
you should not make changes to the pods directly, but update the spec.template.spec section of the deployment used to create the pods.
reason for this is that the deployment is the controller that manages the replicasets and therefore the pods that are created for your application. that means if you apply changes to the pods manifest directly, and something like a pod rescheduling/restart happens, the changes made to the pod will be lost because the replicaset will recreate the pod according to its own specification and not the specification of the last running pod.
you are safe to use kubectl apply to apply changes to existing resources but if you are unsure, you can always extract the current state of the deployment from kubernetes and pipe that output into a yaml file to create a backup:
kubectl get deploy/<name> --namespace <namespace> -o yaml > deploy.yaml
another option is to use the internal rollback mechanism of kubernetes to restore a previous revision of your deployment. see https://learnk8s.io/kubernetes-rollbacks for more infos on that.

Fail to upgrade operator in K8s

I'm writing an operator by operator-sdk and I have created statefulset pod in operator by using k8s api like :
r.client.Create(context.TODO(), statefulset)
It's works correctly and the statefulset pod is crated. But now I want to upgrade the operator already run in k8s so that I can add some command for pod like
Containers: []corev1.Container{{
Command: []string{.....}
First I build the newer operator image and delete the operator in k8s. The k8s quickly restarts the the operator by using the newer image(kubectl describe pod myoperator show newer images is used).
Second I delete the statefulset pod and the k8s also restarts the statefulset pod in seconds.
But I find the statefulset pod doesn't contain the command I added in the operator(kubectl describe pod statefulsetpod). If I delete all the resources in k8s and redeploy them, It works.
I have a lot of resources need be created by the operator so I don't want deploy all the resources.
You should delete statefulset itself instead of statefulset pod. The problem is when you delete statefulset pod - new pod automatically creates from old statefulset spec.
Once you delete/recreate statefulset - as expected you schedule proper updated pods.
Probably you can add additional logic to operator that will patch already existed statefulset - that can be resolution for avoiding redeploy statefulset each time.

Is there a way to apply different configmap for each pod generated by damonset?

I am using filebeat as a daemonset and I would like each generated pod to export to a single port for the logstash.
Is there an approach to be used for this?
No. You cannot provide different configmap to the pods of same daemonset or deployment. If you want each of your pods of daemonset to have different configurations then you can mount some local-volume (using hostpath) so that all the pods will take configuration from that path and that can be different on each node. Or you need to deploy different daemonsets with different configmaps and select different nodes for each of them.
As you can read here:
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod.
...a copy of Pod based on a single template and this is the reason why you cannot specify different ConfigMaps to be used by different Pods managed by DaemonSet Controller.
As an alternative you can configure many different DaemonSets where each one will be responsible for running copy of a Pod specified in the template only on specific node.
Another alternative is using static pods:
It is possible to create Pods by writing a file to a certain directory
watched by Kubelet. These are called static pods. Unlike DaemonSet,
static Pods cannot be managed with kubectl or other Kubernetes API
clients. Static Pods do not depend on the apiserver, making them
useful in cluster bootstrapping cases. Also, static Pods may be
deprecated in the future.
The whole procedure of creation a static Pod is described here.
I hope it helps.
You can use a ConfigMap containing config in each node and expose spec.nodeName environment to your pods. Then your pods can know which node it's running on and decide which config it loads.

Pod gets recreated after deletion

I'm unable to delete the kubernetes pod, it keeps recreating it.
There's no service or deployment associated with the pod. There's a label on the pod thou, is that the root cause?
If I edit the label out with kubectl edit pod podname it removes the label from the pod, but creates a new pod with the same label at the same time. ¿?
Pod can be created by ReplicationControllers or ReplicaSets. The latter one might be created by an Deployment. The described behavior strongly indicates, that the Pod is managed by either of these two.
You can check for these with this commands:
kubectl get rs
kubectl get rc

What is the difference between a pod and a deployment?

I have been creating pods with type:deployment but I see that some documentation uses type:pod, more specifically the documentation for multi-container pods:
apiVersion: v1
kind: Pod
metadata:
name: ""
labels:
name: ""
namespace: ""
annotations: []
generateName: ""
spec:
? "// See 'The spec schema' for details."
: ~
But to create pods I can just use a deployment type:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ""
spec:
replicas: 3
template:
metadata:
labels:
app: ""
spec:
containers:
etc
I noticed the pod documentation says:
The create command can be used to create a pod directly, or it can
create a pod or pods through a Deployment. It is highly recommended
that you use a Deployment to create your pods. It watches for failed
pods and will start up new pods as required to maintain the specified
number. If you don’t want a Deployment to monitor your pod (e.g. your
pod is writing non-persistent data which won’t survive a restart, or
your pod is intended to be very short-lived), you can create a pod
directly with the create command.
Note: We recommend using a Deployment to create pods. You should use
the instructions below only if you don’t want to create a Deployment.
But this raises the question of what kind:pod is good for? Can you somehow reference pods in a deployment? I didn't see a way. It looks like what you get with pods is some extra metadata but none of the deployment options such as replica or a restart policy. What good is a pod that doesn't persist data, survives a restart? I think I'd be able to create a multi-container pod with a deployment as well.
Radek's answer is very good, but I would like to pitch in from my experience, you will almost never use an object with the kind pod, because that doesn't make any sense in practice.
Because you need a deployment object - or other Kubernetes API objects like a replication controller or replicaset - that needs to keep the replicas (pods) alive (that's kind of the point of using kubernetes).
What you will use in practice for a typical application are:
Deployment object (where you will specify your apps container/containers) that will host your app's container with some other specifications.
Service object (that is like a grouping object and gives it a so-called virtual IP (cluster IP) for the pods that have a certain label - and those pods are basically the app containers that you deployed with the former deployment object).
You need to have the service object because the pods from the deployment object can be killed, scaled up and down, and you can't rely on their IP addresses because they will not be persistent.
So you need an object like a service, that gives those pods a stable IP.
Just wanted to give you some context around pods, so you know how things work together.
Hope that clears a few things for you, not long ago I was in your shoes :)
Both Pod and Deployment are full-fledged objects in the Kubernetes API. Deployment manages creating Pods by means of ReplicaSets. What it boils down to is that Deployment will create Pods with spec taken from the template. It is rather unlikely that you will ever need to create Pods directly for a production use-case.
Kubernetes has three Object Types you should know about:
Pods - runs one or more closely related containers
Services - sets up networking in a Kubernetes cluster
Deployment - Maintains a set of identical pods, ensuring that they have the correct config and that the right number of them exist.
Pods:
Runs a single set of containers
Good for one-off dev purposes
Rarely used directly in production
Deployment:
Runs a set of identical pods
Monitors the state of each pod, updating as necessary
Good for dev
Good for production
And I would agree with other answers, forget about Pods and just use Deployment. Why? Look at the second bullet point, it monitors the state of each pod, updating as necessary.
So, instead of struggling with error messages such as this one:
Forbidden: pod updates may not change fields other than spec.containers[*].image
So just refactor or completely recreate your Pod into a Deployment that creates a pod to do what you need done. With Deployment you can change any piece of configuration you want to and you need not worry about seeing that error message.
Pod is container instance.
That is the output of replicas: 3
Think of one deployment can have many running instances(replica).
//deployment.yaml
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: tomcat-deployment222
spec:
selector:
matchLabels:
app: tomcat
replicas: 3
template:
metadata:
labels:
app: tomcat
spec:
containers:
- name: tomcat
image: tomcat:9.0
ports:
- containerPort: 8080
I want to add some informations from Kubernetes In Action book, so you can see all picture and connect relation between Kubernetes resources like Pod, Deployment and ReplicationController(ReplicaSet)
Pods
are the basic deployable unit in Kubernetes. But in real-world use cases, you want your deployments to stay up and running automatically and remain healthy without any manual intervention. For this the recommended approach is to use a Deployment, which under the hood create a ReplicaSet.
A ReplicaSet, as the name implies, is a set of replicas (Pods) maintained with their Revision history.
(ReplicaSet extends an older object called ReplicationController -- which is exactly the same but without the Revision history.)
A ReplicaSet constantly monitors the list of running pods and makes sure the running number of pods matching a certain specification always matches the desired number.
Removing a pod from the scope of the ReplicationController comes in handy
when you want to perform actions on a specific pod. For example, you might
have a bug that causes your pod to start behaving badly after a specific amount
of time or a specific event.
A Deployment
is a higher-level resource meant for deploying applications and updating them declaratively.
When you create a Deployment, a ReplicaSet resource is created underneath (eventually more of them). ReplicaSets replicate and manage pods, as well. When using a Deployment, the actual pods are created and managed by the Deployment’s ReplicaSets, not by the Deployment directly
Let’s think about what has happened. By changing the pod template in your Deployment resource, you’ve updated your app to a newer version—by changing a single field!
Finally, Roll back a Deployment either to the previous revision or to any earlier revision so easy with Deployment resource.
These images are from Kubernetes In Action book, too.
Pod is a collection of containers and basic object of Kuberntes. All containers of pod lie in same node.
Not suitable for production
No rolling updates
Deployment is a kind of controller in Kubernetes.
Controllers use a Pod Template that you provide to create the Pods for which it is responsible.
Deployment creates a ReplicaSet which in turn make sure that,
CurrentReplicas is always same as desiredReplicas .
Advantages :
You can rollout and rollback your changes using deployment
Monitors the state of each pod
Best suitable for production
Supports rolling updates
In Kubernetes we can deploy our workloads using different type of API objects like Pods, Deployment, ReplicaSet, ReplicationController and StatefulSets.
Out of those Pods are the smallest deployable unit in Kubernetes. Any workload/application that runs in Kubernetes, has to run inside a container part of a Pod. A Pod could run multiple containers (meaning multiple applications) within it. A Pod is a wrapper on top of one/many running containers. Using a Pod, kubernetes could control, monitor, operate the containers.
Now using stand alone Pods we can't do lot of things. We can't change configurations, volumes inside Pods. We can't restart the Pod if one is down.
So there is another API Object called Deployment comes into picture which maintains the desired state (how many instances, how much compute resource application uses) of the application. The Deployment maintaines multiple instances of same application by running multiple Pods. Deployments unlike Pods are mutable. Deployments uses another API Object called ReplicaSet to maintain the desired state. Deployments through ReplicaSet spawns another Pod if one is down.
So Pod runs applications in containers. Deployments run Pods and maintains desired state of the application.
Try to avoid Pods and implement Deployments instead for managing containers as objects of kind Pod will not be rescheduled (or self healed) in the event of a node failure or pod termination.
A Deployment is generally preferable because it defines a ReplicaSet to ensure that the desired number of Pods is always available and specifies a strategy to replace Pods, such as RollingUpdate.
May be this example will be helpful for beginners !!
1) Listing PODs
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 92s
webapp-mysql-75dfdf859f-9c54j 1/1 Running 0 92s
2) Deleting web-app pode - which is created using deployment
controlplane $ kubectl -n my-namespace delete pod webapp-mysql-75dfdf859f-9c54j
pod "webapp-mysql-75dfdf859f-9c54j" deleted
3) Listing PODs ( You can see, it is recreated automatically)
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 2m42s
webapp-mysql-75dfdf859f-mqrcx 1/1 Running 0 45s
4) Deleting mysql POD whcih is created directly ( with out deployment)
controlplane $ kubectl -n my-namespace delete pod mysql
pod "mysql" deleted
5) Listing PODs ( You can see mysql POD is lost for ever )
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
webapp-mysql-75dfdf859f-mqrcx 1/1 Running 0 76s
In kubernetes Pods are the smallest deployable units. Every time when we create a kubernetes object like Deployments, replica-sets, statefulsets, daemonsets it creates pod.
As mentioned above deployments create pods based on desired state mentioned in your deployment object. So for example you want 5 replicas of a application, you mentioned replicas: 5 in your deployment manifest. Now deployment controller is responsible to create 5 identical replicas (no less, no more) of given application with all metadata like RBAC policy, networks policy, labels, annotations, health check, resource quotas, taint/tolerations and others and associate with each pods it creates.
There are some cases when you wants to create pod, for example if you are running a test sidecar where you don't need to run application forever, you don't need multiple replicas, and you run application when you wants to execute in that case pod is suitable. For example helm test, which is a pod definition that specifies a container with a given command to run.
I am also a beginner in k8s so correct me if I am wrong.
We know that a pod is created when we create a deployment. What I observed is that if you see the YAML file of the deployment, you can see its kind:deployment. But if you see the YAML file of the pod, you see its kind:pod.