PodTopologyConstraints - DoNotSchedule not working as expected - kubernetes

I tried to use PodTopologySpreadConstraint to organize our deployment engine, and to spread replicas across zones.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 4
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: nginx
tolerations:
- effect: NoSchedule
key: platform
operator: Equal
value: "true"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodes
operator: In
values:
- platform
I expected that, if I have just one node labeled "platform", the scheduler needs to run just one replica of Nginx, and others need to be in a "pending" state because "DoNotSchedule" constraint is not satisfied.
But the deployment running the 4 replicas in the same node/zone, even that "DoNotSchedule" defined.
➜ kubectl get pods -o wide | grep nginx-deployment
nginx-deployment-6cbbb547c-6qvrg 1/1 Running 0 15m 192.168.33.28 ip-192-168-42-2.sa-east-1.compute.internal <none> <none>
nginx-deployment-6cbbb547c-cgbcf 1/1 Running 0 15m 192.168.37.251 ip-192-168-42-2.sa-east-1.compute.internal <none> <none>
nginx-deployment-6cbbb547c-dlqvd 1/1 Running 0 15m 192.168.47.173 ip-192-168-42-2.sa-east-1.compute.internal <none> <none>
nginx-deployment-6cbbb547c-hkgzc 1/1 Running 0 15m 192.168.39.27 ip-192-168-42-2.sa-east-1.compute.internal <none> <none>
and i just have one node labeled "platform"
➜ kubectl get nodes -l app=platform
NAME STATUS ROLES AGE VERSION
ip-192-168-42-2.sa-east-1.compute.internal Ready <none> 4h45m v1.22.15-eks-fb459a0
Does anyone have any idea what I made wrong?

Related

Remove nodeSelectorTerms param in manifest deployment

I use this manifest configuration to deploy a registry into 3 mode Kubernetes cluster:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv1
namespace: registry-space
spec:
capacity:
storage: 5Gi # specify your own size
volumeMode: Filesystem
persistentVolumeReclaimPolicy: Retain
local:
path: /opt/registry # can be any path
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kubernetes2
accessModes:
- ReadWriteMany # only 1 node will read/write on the path.
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv1-claim
namespace: registry-space
spec: # should match specs added in the PersistenVolume
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: private-repository-k8s
namespace: registry-space
labels:
app: private-repository-k8s
spec:
replicas: 1
selector:
matchLabels:
app: private-repository-k8s
template:
metadata:
labels:
app: private-repository-k8s
spec:
volumes:
- name: certs-vol
hostPath:
path: /opt/certs
type: Directory
- name: task-pv-storage
persistentVolumeClaim:
claimName: pv1-claim # specify the PVC that you've created. PVC and Deployment must be in same namespace.
containers:
- image: registry:2
name: private-repository-k8s
imagePullPolicy: IfNotPresent
env:
- name: REGISTRY_HTTP_TLS_CERTIFICATE
value: "/opt/certs/registry.crt"
- name: REGISTRY_HTTP_TLS_KEY
value: "/opt/certs/registry.key"
ports:
- containerPort: 5000
volumeMounts:
- name: certs-vol
mountPath: /opt/certs
- name: task-pv-storage
mountPath: /opt/registry
I manually created directories on every node under /opt/certs and /opt/registry.
But when I try to deploy the manifest without hardcoded nodeSelectorTerms on tha control plane I get error:
kubernetes#kubernetes1:/opt/registry$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-58dbc876ff-fsjd5 1/1 Running 1 (74m ago) 84m
kube-system calico-node-5brzt 1/1 Running 1 (73m ago) 84m
kube-system calico-node-nph9n 1/1 Running 1 (76m ago) 84m
kube-system calico-node-pcd74 1/1 Running 1 (74m ago) 84m
kube-system calico-node-ph2ht 1/1 Running 1 (76m ago) 84m
kube-system coredns-565d847f94-7pswp 1/1 Running 1 (74m ago) 105m
kube-system coredns-565d847f94-tlrfr 1/1 Running 1 (74m ago) 105m
kube-system etcd-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-apiserver-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-controller-manager-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-proxy-4slm4 1/1 Running 1 (76m ago) 86m
kube-system kube-proxy-4tnx2 1/1 Running 2 (74m ago) 105m
kube-system kube-proxy-9dgsj 1/1 Running 1 (73m ago) 85m
kube-system kube-proxy-cgr44 1/1 Running 1 (76m ago) 86m
kube-system kube-scheduler-kubernetes1 1/1 Running 2 (74m ago) 105m
registry-space private-repository-k8s-6d5d954b4f-xkmj5 0/1 Pending 0 4m55s
kubernetes#kubernetes1:/opt/registry$
Do you know how I can let Kubernetes to decide where to deploy the pod?
It seems like your node has taints hence pods are not getting scheduled. Can you try using this command to remove taints from your node ?
kubectl taint nodes <node-name> node-role.kubernetes.io/master-
or
kubectl taint nodes --all node-role.kubernetes.io/master-
To get the node name use kubectl get nodes
User was able to get the pod scheduled after running below command:
kubectl taint nodes kubernetes1 node-role.kubernetes.io/control-plane:NoSchedule-
Now pod is failing due to crashloopbackoff this implies the pod has been scheduled.
Can you please check if this pod is getting scheduled and running properly ?
apiVersion: v1
kind: Pod
metadata:
name: nginx1
namespace: test
spec:
containers:
- name: webserver
image: nginx:alpine
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "200m"
limits:
memory: "128Mi"
cpu: "350m"

nodeSelector doesn't match the target node

I want to deploy a simple nginx on my master node.
Basically, if i use the tolerations combined by nodeName everything is good:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: myapp
name: myapp-deployment
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: nginx
name: myapp-container
tolerations:
- effect: NoExecute
operator: Exists
nodeName: master
The results:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myapp-deployment-56d5887b9-fw5mj 1/1 Running 0 50s 100.32.0.4 master <none> <none>
But the problem is when i add a type=master label to my node and instead of nodeName, useing nodeselector, the deployment stays in Pending state!
Here are my steps:
Add label to my node: k label node master type=master
Check the node label:
$ k get no --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master Ready control-plane 65d v1.24.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=,type=master
Apply my new yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: myapp
name: myapp-deployment
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: nginx
name: myapp-container
tolerations:
- effect: NoExecute
operator: Exists
nodeSelector:
type: master
Check the state:
$ k get po
NAME READY STATUS RESTARTS AGE
myapp-deployment-544784ff98-2qf7z 0/1 Pending 0 3s
Describe it:
Name: myapp-deployment-544784ff98-2qf7z
Namespace: default
Priority: 0
Node: <none>
Labels: app=myapp
pod-template-hash=544784ff98
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/myapp-deployment-544784ff98
Containers:
myapp-container:
Image: nginx
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lbtsv (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-lbtsv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: type=master
Tolerations: :NoExecute op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 111s default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Where am i wrong? What is my problem?
P.S: kubernetes version:
Client Version: v1.24.1
Kustomize Version: v4.5.4
Server Version: v1.24.1
Check your master node it might be having the taint set to NoSchedule
kubectl describe node <Node name> | grep Taint
If you want to run POD on master node use this config
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
nodeSelector:
node-role.kubernetes.io/master: ""
Read more about the Concept taint and toleration: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
Well, thanks to #Harsh, i finally finded the answer:
First i get the Taint on my master node:
$ kubectl describe node master | grep Taint
Taints: node-role.kubernetes.io/control-plane:NoSchedule
As you can see, the value of Taint here is NoSchedule, NOT NoExecute that i used before!
So, the configuration would be like this:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: myapp
name: myapp-deployment
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: nginx
name: myapp-container
tolerations:
- effect: "NoSchedule" # just change this
operator: "Exists"
nodeSelector:
type: master
And now you can see everything is good!
NAME READY STATUS RESTARTS AGE
myapp-deployment-79676c54d4-grm94 1/1 Running 0 7s

why pods created by the Deployment running on NotReady node all the time

I have three nodes. when I shutdown cdh-k8s-3.novalocal ,pods running on it all the time
# kubectl get node
NAME STATUS ROLES AGE VERSION
cdh-k8s-1.novalocal Ready control-plane,master 15d v1.20.0
cdh-k8s-2.novalocal Ready <none> 9d v1.20.0
cdh-k8s-3.novalocal NotReady <none> 9d v1.20.0
# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-66b6c48dd5-5jtqv 1/1 Running 0 3h28m 10.244.26.110 cdh-k8s-3.novalocal <none> <none>
nginx-deployment-66b6c48dd5-fntn4 1/1 Running 0 3h28m 10.244.26.108 cdh-k8s-3.novalocal <none> <none>
nginx-deployment-66b6c48dd5-vz7hr 1/1 Running 0 3h28m 10.244.26.109 cdh-k8s-3.novalocal <none> <none>
my yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
# kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 0/3 3 0 3h28m
I find the Doc
DaemonSet pods are created with NoExecute tolerations for the following taints with no tolerationSeconds:
node.kubernetes.io/unreachable
node.kubernetes.io/not-ready
This ensures that DaemonSet pods are never evicted due to these problems.
But it is DaemonSet and not Deployment

How to get logs of deployment from Kubernetes?

I am creating an InfluxDB deployment in a Kubernetes cluster (v1.15.2), this is my yaml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: monitoring-influxdb
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
task: monitoring
k8s-app: influxdb
spec:
containers:
- name: influxdb
image: registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-influxdb-amd64:v1.5.2
volumeMounts:
- mountPath: /data
name: influxdb-storage
volumes:
- name: influxdb-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
task: monitoring
# For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
# If you are NOT using this as an addon, you should comment out this line.
kubernetes.io/cluster-service: 'true'
kubernetes.io/name: monitoring-influxdb
name: monitoring-influxdb
namespace: kube-system
spec:
ports:
- port: 8086
targetPort: 8086
selector:
k8s-app: influxdb
And this is the pod status:
$ kubectl get deployment -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
coredns 1/1 1 1 163d
kubernetes-dashboard 1/1 1 1 164d
monitoring-grafana 0/1 0 0 12m
monitoring-influxdb 0/1 0 0 11m
Now, I've been waiting 30 minutes and there is still no pod available, how do I check the deployment log from command line? I could not access the Kubernetes dashboard now. I am searching a command to get the pod log, but now there is no pod available. I already tried to add label in node:
kubectl label nodes azshara-k8s03 k8s-app=influxdb
This is my deployment describe content:
$ kubectl describe deployments monitoring-influxdb -n kube-system
Name: monitoring-influxdb
Namespace: kube-system
CreationTimestamp: Wed, 04 Mar 2020 11:15:52 +0800
Labels: k8s-app=influxdb
task=monitoring
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"monitoring-influxdb","namespace":"kube-system"...
Selector: k8s-app=influxdb,task=monitoring
Replicas: 1 desired | 0 updated | 0 total | 0 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: k8s-app=influxdb
task=monitoring
Containers:
influxdb:
Image: registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-influxdb-amd64:v1.5.2
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/data from influxdb-storage (rw)
Volumes:
influxdb-storage:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
OldReplicaSets: <none>
NewReplicaSet: <none>
Events: <none>
This is another way to get logs:
$ kubectl -n kube-system logs -f deployment/monitoring-influxdb
error: timed out waiting for the condition
There is no output for this command:
kubectl logs --selector k8s-app=influxdb
There is all my pod in kube-system namespace:
~/Library/Mobile Documents/com~apple~CloudDocs/Document/k8s/work/heapster/heapster-deployment ⌚ 11:57:40
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-569fd64d84-5q5pj 1/1 Running 0 46h
kubernetes-dashboard-6466b68b-z6z78 1/1 Running 0 11h
traefik-ingress-controller-hx4xd 1/1 Running 0 11h
kubectl logs deployment/<name-of-deployment> # logs of deployment
kubectl logs -f deployment/<name-of-deployment> # follow logs
You can try kubectl describe deploy monitoring-influxdb to get some high-level view of the deployment, maybe some information here.
For more detailed logs, first get the pods: kubectl get po
Then, request the pod logs: kubectl logs <pod-name>
Adding references of two great tools that might help you view cluster logs:
If you wish to view logs from your terminal without using a "heavy" 3rd party logging solution I would consider using K9S which is a great CLI tool that help you get control over your cluster.
If you are not bound only to the CLI and still want run locally I would recommend on Lens.

NetworkPolicy can not restrict Ingress from UI

I have a flask service (6 replicas) and ui (3 replicas) deployed using a kind:Deployment but when i add a calico NetworkPolicy like this:
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: application-network-policy
namespace: team-prod-xyz
labels:
app: application-network-policy
spec:
podSelector:
matchLabels:
app: xyz-svc
run: xyz-svc
ingress:
- ports:
- port: 8000
from:
- podSelector:
matchLabels:
app: xyz-ui
egress:
- {}
policyTypes:
- Ingress
- Egress
My flask service goes like this if i directly access it
504 Gateway Time-out
nginx/1.15.3
which is probably expected but my UI can not hit the endpoints as well.
Why is that?
EDIT 2: Kubernetes and Ingress Information
Kubernetes Version -
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
NAME READY STATUS RESTARTS AGE
pod/xyz-mongodb-replicaset-0 1/1 Running 0 10d
pod/xyz-mongodb-replicaset-1 1/1 Running 0 7d
pod/xyz-mongodb-replicaset-2 1/1 Running 0 6d23h
pod/xyz-svc-7b589fbd4-25qd6 1/1 Running 0 20h
pod/xyz-svc-7b589fbd4-9n8jh 1/1 Running 0 20h
pod/xyz-svc-7b589fbd4-r5q9g 1/1 Running 0 20h
pod/xyz-ui-7d6f44b57b-8s4mq 1/1 Running 0 3d20h
pod/xyz-ui-7d6f44b57b-bl8r6 1/1 Running 0 3d20h
pod/xyz-ui-7d6f44b57b-jwhc2 1/1 Running 0 3d20h
pod/mongodb-backup-check 1/1 Running 0 20h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/xyz-mongodb-replicaset ClusterIP None <none> 27017/TCP 10d
service/xyz-prod-service ClusterIP 10.3.92.123 <none> 8000/TCP 20h
service/xyz-prod-ui ClusterIP 10.3.49.132 <none> 80/TCP 10d
--Deployment--
--Replicasset--
--Statefulset--
My ingress looks like -
Name: xyz-prod-svc
Namespace: prod-xyz
Address:
Default backend: default-http-backend:80 (<none>)
TLS:
prod terminates xyz.prod.domain.com
Rules:
Host Path Backends
---- ---- --------
xyz.prod.domain.com
/ xyz-prod-u:80 (10.7.2.4:80,10.7.4.22:80,10.7.5.24:80)
/project xyz-prod-servic:8000 (10.7.2.15:8000,10.7.5.10:8000,10.7.5.10:8000 + 3 more...)
/trigger xyz-prod-servic:8000 (10.7.2.15:8000,10.7.5.10:8000,10.7.5.10:8000 + 3 more...)
/kpi xyz-prod-servic:8000 (10.7.2.15:8000,10.7.5.10:8000,10.7.5.10:8000 + 3 more...)
/feedback xyz-prod-servic:8000 (10.7.2.15:8000,10.7.5.10:8000,10.7.5.10:8000 + 3 more...)
Do I have to specify my Ingress in the podSelector option of my Network Policy?
So far my Network Policy looks like this -
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: application-network-policy
namespace: app-prod-xyz
labels:
app: application-network-policy
spec:
podSelector:
matchLabel:
run: xyz-svc
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: xyz-ui
- podSelector:
matchLabels:
app: application-health-check
egress:
- to:
- podSelector:
matchLabels:
app: xyz-ui
- podSelector:
matchLabels:
app: xyz-mongodb-replicaset
- podSelector:
matchLabels:
app: mongodb-replicaset
EDIT 1: I learned that we need to expose port 8000 using a config map before the network policy.
EDIT 3: With UI I mean the deployment done with the node image. I have to check whether the request is being sent through the UI pod or directly to the svc pod.