Adding pod nodeSelector after creation - kubernetes

Using OpenShift 3.1/K8 1.1 and given a pod that has already been created with/without a nodeSelector.
I.e.
apiVersion: v1
kind: Pod
metadata:
generateName: blah-
labels:
name: blah
spec:
containers:
image: some/image
name: blah-image
ports:
- containerPort: 8080
nodeSelector: # can you add this after this pod has been created?
region: infra
Is it possible to change/add a nodeSelector?
Similar to the way you add/modify labels

You can change it in the associated ReplicationController (if any) but not in the definition of a running Pod. If you edit the RC as suggested the Pod itself must be recreated in order to start on the selected node(s).

In OpenShift if you are using a deployment config (the predecessor to Kube's Deployment object) you can edit your DC and add them. On the cli it's:
oc edit dc/NAME
That will trigger a rolling update that creates a new RC and scales down the old, unlabeled pods.

Related

Pod is not getting selected by Deployment selector

I have this Deployment object:
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-webserver-nginx
annotations:
description: This is a demo deployment for nginx webserver
labels:
app: deployment-webserver-nginx
spec:
replicas: 3
selector:
matchLabels:
app: deployment-webserver-pods
template:
metadata:
labels:
app: deployment-webserver-pods
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
My understanding on this Deployment object is that any Pod with app:deployment-webserver-pods label will be selected. Of course, this Deployment object is creating 3 replicas, but I wanted to add one more Pod explicitly like this, so I created a Pod object and had its label as app:deployment-webserver-pods, below is its Pod definition:
apiVersion: v1
kind: Pod
metadata:
name: deployment-webserver-nginx-extra-pod
labels:
app: deployment-webserver-pods
spec:
containers:
- name: nginx-alpine-container-1
image: nginx:alpine
ports:
- containerPort: 81
My expectation was that continuously running Deployment Controller will pick this new Pod, and when I do kubectl get deploy then I will see 4 pods running. But that didn't happen.
I even tried to first create this pod with this label, and then created my Deployment and thought that maybe now this explicit Pod will be picked but still that didn't happen.
Doesn't Labels and Selectors work like this?
I know I can scale by deployment to 4 Replicas, but I am trying to understand how Pods / other Kubernetes objects are selected using Labels and Selectors.
From the official docs:
Note: You should not create other Pods whose labels match this
selector, either directly, by creating another Deployment, or by
creating another controller such as a ReplicaSet or a
ReplicationController. If you do so, the first Deployment thinks that
it created these other Pods. Kubernetes does not stop you from doing
this.
As described further in docs, it is not recommended to scale replicas of the deployments using the above approach.
Another important point to note from same section of docs:
If you have multiple controllers that have overlapping selectors, the
controllers will fight with each other and won't behave correctly.
My expectation was that continuously running Deployment Controller will pick this new Pod, and when I do kubectl get deploy then I will see 4 pods running. But that didn't happen.
The Deployment Controller does not work like that, it listen for Deployment-resources and "drive" them to desired state. That typically means, if any change in the template:-part, then a new ReplicaSet is created with the number of replicas. You cannot add a Pod to a Deployment in another way than changing replicas: - each instance is created from the same Pod-template and is identical.
Doesn't Labels and Selectors work like this?
... but I am trying to understand how Pods / other Kubernetes objects are selected using Labels and Selectors.
Yes, Labels and Selectors are used for many things in Kubernetes, but not for everything. When you create a Deployment with a label, and a Pod with the same label and finally a Service with a selector - then the traffic addressed to that Service will distribute traffic to your instances of your Deployment as well as to your extra Pod.
Example:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: deployment-webserver-pods
ports:
- protocol: TCP
port: 80
targetPort: 8080
Labels and Selector are also useful for management when using e.g. kubectl. You can add labels for Teams or e.g. App, then you can select all Deployments or Pods belonging to that Team or App (e.g. if the app consist of App-deployment and a cache-deployment), e.g:
kubectl get pods -l team=myteam,app=customerservice
My expectation was that continuously running Deployment Controller
will pick this new Pod, and when I do kubectl get deploy then I will
see 4 pods running. But that didn't happen.
Kubernetes is a system that operates "Declaratively" and not "Imperatively" which means you write down the desired state of the application in the cluster typically through a YAML file, and these declared desired states define all of the pieces of your application.
If a cluster were to configured imperatively like the way you are expecting it to be, it would have been very difficult to understand and replicate how the cluster came to be in that state.
Just to add in the above explanations that if we are trying to manually create pod and manage then what is the purpose of having controllers in K8s.
My expectation was that continuously running Deployment Controller
will pick this new Pod, and when I do kubectl get deploy then I will
see 4 pods running. But that didn't happen.
As per your yaml replicas:3 was already set so deployment would not take a new pod as the 4th replica.

Disable resource reservation for the complete kubernetes cluster

Is it somehow possible to force the scheduler to ignore the available resources on a node/cluster while scheduling new pods?
We we would like to "overload" our cluster in our lab environment for testing purposes. I could not find anything about it in the docs. Thanks!
There are bunch of feature flags which you can possibly tweak to achieve it but I would say why not use nodeName in the pod spec and effectively bypass the scheduler.
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeName: kube-01
The above pod will run on the node kube-01
This doc may help. You can try to remove the filter PodFitsResources.

how to set different environment variables of Deployment replicas in kubernetes

I have 4 k8s pods by setting the replicas of Deployment to 4 now.
apiVersion: v1
kind: Deployment
metadata:
...
spec:
...
replicas: 4
...
The POD will get items in a database and consume it, the items in database has a column class_name.
now I want one pod only get one class_name's item.
for example pod1 only get item which class_name equals class_name_1, and pod2 only get item which class_name equals class_name_2...
So I want to pass different class_name as environment variables to different Deployment PODs. Can I define it in the yaml file of Deployment?
Or is there any other way to achieve my goal?(like something other than Deployment in k8s)
For distributed job processing Deployments are not very good, because they don't have any type of ordering or consistent pod hostnames. You'd better use StatefulSet for it, because they have consistent naming, like pod-0, pod-1, pod-2. You can rely on that hostname index.
For example, if your class_name_idx - is the index of class name in class names list, num_replicas - is the number of replicas in StatefulSet and pod_idx - is the index of pod in StatefulSet, then pod should run the job only if: class_name_idx % num_replicas == pod_idx.
Unfortunately number of StatefulSet replicas cannot be obtained within the pod dynamically using Downward API, so you can either hardcode it or use Kubernetes API to obtain it from cluster.
Neither Deployment nor anything else won't help to achieve your goal. Your goal is some kind of logic and it should be implemented via code in your application.
Since the Deployment is some instances of the same application the only thing that might be useful for you is: using multiple deployments, each for its own task. The first could get class_name_1 item, while other class_name_2, class_name_3 etc. But it is not a good idea
I would not recommend this approach, but the closest thing to do what you want is using the stateful-set and use the pod name as the index.
When you deploy a stateful set, the pods will be named after their statefulset name, in the following sample:
apiVersion: v1
kind: Service
metadata:
name: kuard
labels:
app: kuard
spec:
type: NodePort
ports:
- port: 8080
name: web
selector:
app: kuard
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kuard
spec:
serviceName: "kuard"
replicas: 3
selector:
matchLabels:
app: kuard
template:
metadata:
labels:
app: kuard
spec:
containers:
- name: kuard
image: gcr.io/kuar-demo/kuard-amd64:1
ports:
- containerPort: 8080
name: web
The pods created by the statefulset will be named as:
kuard-0
kuard-1
kuard-2
This way you could either, name the stateful-set according to the classes, i.e: class-name and the pod created will be class-name-0 and you can replace the _ by -. Or just strip the name out to get the index at the end.
To get the name just read the environment variable HOSTNAME
This naming is consistent, so you can make sure you always have 0, 1, 2, 3 after the name. And if the 2 goes down, it will be recreated.
Like I said, I would not recommend this approach because you tie the infrastructure to your code, and also can't scale(if needed) because each service are unique and adding new instances would get new ids.
A better approach would be using one deployment for each class and pass the proper values as environment variables.

Restart a Successful/Failed pod manually

running kubernetes v1.2.2 on coreos on vmware:
I have a pod with the restart policy set to Never. Is it possible to manually start the same pod back up?
In my use case we will have a postgres instance in this pod. If it was to crash I would like to leave the pod in a failed state until we can look at it closer to see why it failed and then start it manually. Rather than try to restart with a restartpolicy of Always.
Looking through kubectl it doesnt seem like there is a manual start option. I could delete and recreate but i think this would remove the data from my container. Maybe I should be mounting a local volume on my host, and I should not need to worry about losing data?
this is my sample pod yaml. I dont seem to be able to restart the 'health' pod.
apiVersion: v1
kind: Pod
metadata:
name: health
labels:
environment: dev
app: health
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
restartPolicy: Never
One simple method that might address your needs is to add a unique instance label, maybe a simple counter. If each pod is labelled differently you can start as many as you like and keep around as many failed instances as you like.
e.g. first pod
apiVersion: v1
kind: Pod
metadata:
name: health
labels:
environment: dev
app: health
instance: 0
spec:
containers: ...
second pod
apiVersion: v1
kind: Pod
metadata:
name: health
labels:
environment: dev
app: health
instance: 1
spec:
containers: ...
Based on your question and comments sounds like you want to restart a failed container to retain its state and data. In fact, application containers and pods are considered to be relatively ephemeral (rather than durable) entities. When a container crashes its files will be lost and kubelet will restart it with a clean state.
To retain your data and logs use persistent volume types in your deployment. This will let you to preserve data across container restarts.

Upgrade image in a Deployment's pods

I have a Deployment with 3 replicas of a pod running the image 172.20.20.20:5000/my_app, that is located in my private registry.
I want do a rolling-update in the deployment when I push a new latest version of that image to my registry.
I push the new image this way (tag v3.0 to latest):
$ docker tag 172.20.20.20:5000/my_app:3.0 172.20.20.20:5000/my_app
$ docker push 172.20.20.20:5000/my_app
But nothing happens. Pods' images are not upgraded. This is my deployment definition:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp-deployment
spec:
replicas: 3
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: 172.20.20.20:5000/my_app:latest
ports:
- containerPort: 8080
Is there a way to do that automatically? Should I run a command like rolling-update like in ReplicaControllers?
In order to upgrade a Deployment you have to modify the Deployment resource with the new image. So for example, change 172.20.20.20:5000/my_app:v1 to 172.20.20.20:5000/my_app:v2. Since you're just modifying the image within the Docker registry doesn't notice the change.
If you (manually) kill the individual Pods, the Deployment will restart them. Since the Deployment image specifies the "latest" tag Kubernetes will download the latest version (now "v3" in your case) due to the implied "Always" ImagePullPolicy.