set priorityclass by default for a namespace - kubernetes

I would like to know how it's possible to set a priorityClass by default for all pods in a specific namespace without using a
globalvalue: true
may be with admission controller but i don't know.
Do you have an concret example for that ?

PriorityClass : A PriorityClass is a non-namespaced object
PriorityClass also has two optional fields: globalDefault and description.
The globalDefault field indicates that the value of this PriorityClass should be used for Pods without a priorityClassName.
Only one PriorityClass with globalDefault set to true can exist in the system. If there is no PriorityClass with globalDefault set, the priority of Pods with no priorityClassName is zero.
Create Priority Class as using below yaml (no globalDefault flag is set)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
description: "This priority class should be used for pods."
$ kubectl get priorityclasses.scheduling.k8s.io
NAME VALUE GLOBAL-DEFAULT AGE
high-priority 1000000 false 10s
Now add priority class to pod manifest and schedule them in your namespace
$ kubectl create namespace priority-test
namespace/priority-test created
$ kubectl get namespaces
NAME STATUS AGE
default Active 43m
kube-node-lease Active 43m
kube-public Active 43m
kube-system Active 43m
priority-test Active 5s
Example : pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
priorityClassName: high-priority
$ kubectl apply -f pod.yaml -n priority-test
pod/nginx created
ubuntu#k8s-master-1:~$ kubectl get all -n priority-test
NAME READY STATUS RESTARTS AGE
pod/nginx 1/1 Running 0 25s
$ kubectl describe pod -n priority-test nginx | grep -i priority
Namespace: priority-test
Priority: 1000000
Priority Class Name: high-priority
Normal Scheduled <unknown> default-scheduler Successfully assigned priority-test/nginx to worker-1

Currently per namespace priorities are not possible.
But you can achieve similar result if instead you set default priorityClass with globalDefault: true and e.g. value: 1000. Next create another lower priority class and with e.g. value: 100 and add it to all dev/staging pods.
Btw. not directly related to the question but it would be much easier to accomplish what you need if you use nodeSelectors and schedule dev pods to separate nodes. This way production pods don't have to compete for resources with non-essential pods.

Related

Kubernetes resource quota, have non schedulable pod staying in pending state

So I wish to limit resources used by pod running for each of my namespace, and therefor want to use resource quota.
I am following this tutorial.
It works well, but I wish something a little different.
When trying to schedule a pod which will go over the limit of my quota, I am getting a 403 error.
What I wish is the request to be scheduled, but waiting in a pending state until one of the other pod end and free some resources.
Any advice?
Instead of using straight pod definitions (kind: Pod) use deployment.
Why?
Pods in Kubernetes are designed as relatively ephemeral, disposable entities:
You'll rarely create individual Pods directly in Kubernetes—even singleton Pods. This is because Pods are designed as relatively ephemeral, disposable entities. When a Pod gets created (directly by you, or indirectly by a controller), the new Pod is scheduled to run on a Node in your cluster. The Pod remains on that node until the Pod finishes execution, the Pod object is deleted, the Pod is evicted for lack of resources, or the node fails.
Kubernetes assumes that for managing pods you should a workload resources instead of creating pods directly:
Pods are generally not created directly and are created using workload resources. See Working with Pods for more information on how Pods are used with workload resources.
Here are some examples of workload resources that manage one or more Pods:
Deployment
StatefulSet
DaemonSet
By using deployment you will get very similar behaviour to the one you want.
Example below:
Let's suppose that I created pod quota for a custom namespace, set to "2" as in this example and I have two pods running in this namespace:
kubectl get pods -n quota-demo
NAME READY STATUS RESTARTS AGE
quota-demo-1 1/1 Running 0 75s
quota-demo-2 1/1 Running 0 6s
Third pod definition:
apiVersion: v1
kind: Pod
metadata:
name: quota-demo-3
spec:
containers:
- name: quota-demo-3
image: nginx
ports:
- containerPort: 80
Now I will try to apply this third pod in this namespace:
kubectl apply -f pod.yaml -n quota-demo
Error from server (Forbidden): error when creating "pod.yaml": pods "quota-demo-3" is forbidden: exceeded quota: pod-demo, requested: pods=1, used: pods=2, limited: pods=2
Not working as expected.
Now I will change pod definition into deployment definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: quota-demo-3-deployment
labels:
app: quota-demo-3
spec:
selector:
matchLabels:
app: quota-demo-3
template:
metadata:
labels:
app: quota-demo-3
spec:
containers:
- name: quota-demo-3
image: nginx
ports:
- containerPort: 80
I will apply this deployment:
kubectl apply -f deployment-v3.yaml -n quota-demo
deployment.apps/quota-demo-3-deployment created
Deployment is created successfully, but there is no new pod, Let's check this deployment:
kubectl get deploy -n quota-demo
NAME READY UP-TO-DATE AVAILABLE AGE
quota-demo-3-deployment 0/1 0 0 12s
We can see that a pod quota is working, deployment is monitoring resources and waiting for the possibility to create a new pod.
Let's now delete one of the pod and check deployment again:
kubectl delete pod quota-demo-2 -n quota-demo
pod "quota-demo-2" deleted
kubectl get deploy -n quota-demo
NAME READY UP-TO-DATE AVAILABLE AGE
quota-demo-3-deployment 1/1 1 1 2m50s
The pod from the deployment is created automatically after deletion of the pod:
kubectl get pods -n quota-demo
NAME READY STATUS RESTARTS AGE
quota-demo-1 1/1 Running 0 5m51s
quota-demo-3-deployment-7fd6ddcb69-nfmdj 1/1 Running 0 29s
It works the same way for memory and CPU quotas for namespace - when the resources are free, deployment will automatically create new pods.

Kubernetes HPA pod custom metrics shows as <unknown>

I have managed to install Prometheus and it's adapter and I want to use one of the pod metrics for autoscaling
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . |grep "pods/http_request".
"name": "pods/http_request_duration_milliseconds_sum",
"name": "pods/http_request",
"name": "pods/http_request_duration_milliseconds",
"name": "pods/http_request_duration_milliseconds_count",
"name": "pods/http_request_in_flight",
Checking api I want to use pods/http_request and added it to my HPA configuration
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: app
namespace: app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
minReplicas: 4
maxReplicas: 8
metrics:
- type: Pods
pods:
metric:
name: http_request
target:
type: AverageValue
averageValue: 200
After applying the yaml and check the hpa status it shows up as <unkown>
$ k apply -f app-hpa.yaml
$ k get hpa
NAME REFERENCE TARGETS
app Deployment/app 306214400/2000Mi, <unknown>/200 + 1 more...
But when using other pod metrics such as pods/memory_usage_bytes the value is properly detected
Is there a way to check the proper values for this metric? and how do I properly add it for my hpa configuration
Reference https://www.ibm.com/support/knowledgecenter/SSBS6K_3.2.0/manage_cluster/hpa.html
1st deploy metrics server, it should be up and running.
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Then in a few sec. metrics server deployed. check HPA it should resolved.
$ kubectl get deployment -A
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
.
.
kube-system metrics-server 1/1 1 1 34s
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ha-xxxx-deployment Deployment/xxxx-deployment 1%/5% 1 10 1 6h46m

kubectl delete deployment not removing pods and replicasets

We run the following command in k8s
kubectl delete deployment ${our-deployment-name}
And this seems to delete the deployment called our-deployment-name fine. However we also want to delete the replicasets and pods that below to 'our-deployment-name'.
Reading the documents it is not clear if the default behaviour should cascade delete replicasets and pods. Does anybody know how do delete the deployment and all related replicasets and pods? Or do I have to manually delete all of those resources as well?
When I delete a deployment I have an orphaned replicaset like this...
dev#jenkins:~$ kubectl describe replicaset.apps/wc-892-74697d58d9
Name: wc-892-74697d58d9
Namespace: default
Selector: app=wc-892,pod-template-hash=74697d58d9
Labels: app=wc-892
pod-template-hash=74697d58d9
Annotations: deployment.kubernetes.io/desired-replicas: 1
deployment.kubernetes.io/max-replicas: 2
deployment.kubernetes.io/revision: 1
Controlled By: Deployment/wc-892
Replicas: 1 current / 1 desired
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=wc-892
pod-template-hash=74697d58d9
Containers:
wc-892:
Image: registry.digitalocean.com/galatea/wastecoordinator-wc-892:1
Port: 8080/TCP
Host Port: 0/TCP
Limits:
memory: 800Mi
Environment: <none>
Mounts: <none>
Volumes: <none>
Priority Class Name: dev-lower-priority
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 11m replicaset-controller Created pod: wc-892-74697d58d9-jtj9t
dev#jenkins:~$
As you can see in the replicaset Controlled By: Deployment/wc-892 which means deleting the deployment wc-892 should delete the replicaset which would in turn delete the pods with label app=wc-892
First get the deployments which you want to delete
kubectl get deployments
and delete the deployment which wou want
kubectl delete deployment yourdeploymentname
This will delete the replicaset and pods associted with it.
kubectl delete deployment <deployment> will delete all ReplicaSets associated with the deployment AND the active pods associated with those ReplicaSets.
The controller-manager or API Server might be having issue handling the delete request. So I'd advise looking at those logs to verify.
Note, it's possible the older replicasets are attached to something else in the namespace? Try listing and look at the metadata. Using kubectl describe rs <rs> or kubectl get rs -o yaml

Subnetting within Kubernetes Cluster

I have couple of deployments - say Deployment A and Deployment B. The K8s Subnet is 10.0.0.0/20.
My requirement : Is it possible to get all pods in Deployment A to get IP from 10.0.1.0/24 and pods in Deployment B from 10.0.2.0/24.
This helps the networking clean and with help of IP itself a particular deployment can be identified.
Deployment in Kubernetes is a high-level abstraction that rely on controllers to build basic objects. That is different than object itself such as pod or service.
If you take a look into deployments spec in Kubernetes API Overview, you will notice that there is no such a thing as defining subnets, neither IP addresses that would be specific for deployment so you cannot specify subnets for deployments.
Kubernetes idea is that pod is ephemeral. You should not try to identify resources by IP addresses as IPs are randomly assigned. If the pod dies it will have another IP address. You could try to look on something like statefulsets if you are after unique stable network identifiers.
While Kubernetes does not support this feature I found workaround for this using Calico: Migrate pools feature.
First you need to have calicoctl installed. There are several ways to do that mentioned in the install calicoctl docs.
I choose to install calicoctl as a Kubernetes pod:
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
To make work faster you can setup an alias :
alias calicoctl="kubectl exec -i -n kube-system calicoctl /calicoctl -- "
I have created two yaml files to setup ip pools:
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: pool1
spec:
cidr: 10.0.0.0/24
ipipMode: Always
natOutgoing: true
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: pool2
spec:
cidr: 10.0.1.0/24
ipipMode: Always
natOutgoing: true
Then you you have apply the following configuration but since my yaml were being placed in my host filesystem and not in calico pod itself I placed the yaml as an input to the command:
➜ cat ippool1.yaml | calicoctl apply -f-
Successfully applied 1 'IPPool' resource(s)
➜ cat ippool2.yaml | calicoctl apply -f-
Successfully applied 1 'IPPool' resource(s)
Listing the ippools you will notice the new added ones:
➜ calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED SELECTOR
default-ipv4-ippool 192.168.0.0/16 true Always Never false all()
pool1 10.0.0.0/24 true Always Never false all()
pool2 10.0.1.0/24 true Always Never false all()
Then you can specify what pool you want to choose for you deployment:
---
metadata:
labels:
app: nginx
name: deployment1-pool1
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
annotations:
cni.projectcalico.org/ipv4pools: "[\"pool1\"]"
---
I have created similar one called deployment2 that used ippool2 with the results below:
deployment1-pool1-6d9ddcb64f-7tkzs 1/1 Running 0 71m 10.0.0.198 acid-fuji
deployment1-pool1-6d9ddcb64f-vkmht 1/1 Running 0 71m 10.0.0.199 acid-fuji
deployment2-pool2-79566c4566-ck8lb 1/1 Running 0 69m 10.0.1.195 acid-fuji
deployment2-pool2-79566c4566-jjbsd 1/1 Running 0 69m 10.0.1.196 acid-fuji
Also its worth mentioning that while testing this I found out that if your default deployment will have many replicas and will ran out of ips Calico will then use different pool.

kubernetes hidden replica set?

I'm learning Kubernetes and just came across an issue and would like to check if anyone else has come across it,
user#ubuntu:~/rc$ kubectl get rs ### don’t see any replica set
user#ubuntu:~/rc$
user#ubuntu:~/rc$
user#ubuntu:~/rc$ kubectl get pod
NAME READY STATUS RESTARTS AGE
bigwebstuff-673k9 1/1 Running 0 7m
bigwebstuff-cs7i3 1/1 Running 0 7m
bigwebstuff-egbqd 1/1 Running 0 7m
user#ubuntu:~/rc$
user#ubuntu:~/rc$
user#ubuntu:~/rc$ kubectl delete pod bigwebstuff-673k9 bigwebstuff-cs7i3 #### delete pods
pod "bigwebstuff-673k9" deleted
pod "bigwebstuff-cs7i3" deleted
user#ubuntu:~/rc$
user#ubuntu:~/rc$ kubectl get pod #### the deleted pods regenerated
NAME READY STATUS RESTARTS AGE
bigwebstuff-910m9 1/1 Running 0 6s
bigwebstuff-egbqd 1/1 Running 0 8m
bigwebstuff-fksf6 1/1 Running 0 6s
You see the deleted pods are regenerated, though I can’t find the replica set, as if a hidden replicate set exist somewhere.
The 3 pods are started from rc.yaml file as follows,
user#ubuntu:~/rc$ cat webrc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: bigwebstuff
labels:
name: bigwebstuff
spec:
replicas: 3
selector:
run: testweb
template:
metadata:
labels:
run: testweb
spec:
containers:
- name: podweb
image: nginx
ports:
- containerPort: 80
But it didn’t show up after I use the yams file to create the pods.
Any idea on how to find the hidden replica set? Or why the pods gets regenerated?
A "ReplicaSet" is not the same thing as a "ReplicationController" (although they are similar). The kubectl get rs command lists replica sets, whereas the manifest file in your question creates a replication controller. Instead, use the kubectl get rc command to list replication controllers (or alternatively, change your manifest file to create a ReplicaSet instead of a ReplicationController).
On the difference between ReplicaSets and ReplicationControllers, let me quote the documentation:
Replica Set is the next-generation Replication Controller. The only difference between a Replica Set and a Replication Controller right now is the selector support. Replica Set supports the new set-based selector requirements as described in the labels user guide whereas a Replication Controller only supports equality-based selector requirements.
Replica sets and replication controllers are not the same thing. Try the following:
kubectl get rc
And then delete accordingly.