Kubernetes daemonset creating two pods instead of one (expected) - kubernetes

I have the following local 2-node kubernetes cluster-
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
srl1 Ready control-plane,master 2d18h v1.21.2 xxx.xxx.12.58 <none> Ubuntu 20.04.2 LTS 5.4.0-80-generic docker://20.10.7
srl2 Ready <none> 2d18h v1.21.3 xxx.xxx.80.72 <none> Ubuntu 18.04.2 LTS 5.4.0-80-generic docker://20.10.2
I am trying to deploy an application on using a cluster creation python scirpt(https://github.com/hydro-project/cluster/blob/master/hydro/cluster/create_cluster.py)
When it tries to create a routing node using apps_client.create_namespaced_daemon_set(namespace=util.NAMESPACE, body=yml) it is expected that it should create a single pod from the routing-ds.yaml (given below) file and assign it to the routing daemonset (kind). However as you can see, it is creating two routing pods instead of one on every physical node. (FYI-my master can schedule pod)
akazad#srl1:~/hydro-project/cluster$ kubectl get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default pod/management-pod 1/1 Running 0 25m 192.168.190.77 srl2 <none> <none>
default pod/monitoring-pod 1/1 Running 0 25m 192.168.120.71 srl1 <none> <none>
default pod/routing-nodes-9q7dr 1/1 Running 0 24m xxx.xxx.12.58 srl1 <none> <none>
default pod/routing-nodes-kfbnv 1/1 Running 0 24m xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/calico-kube-controllers-7676785684-tpz7q 1/1 Running 0 2d19h 192.168.120.65 srl1 <none> <none>
kube-system pod/calico-node-lnxtb 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/calico-node-mdvpd 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/coredns-558bd4d5db-vfghf 1/1 Running 0 2d19h 192.168.120.66 srl1 <none> <none>
kube-system pod/coredns-558bd4d5db-x7jhj 1/1 Running 0 2d19h xxx.xxx.120.67 srl1 <none> <none>
kube-system pod/etcd-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-apiserver-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-controller-manager-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-proxy-l8fds 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-proxy-szrng 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/kube-scheduler-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
metallb-system pod/controller-6b78bff7d9-t7gjr 1/1 Running 0 2d19h 192.168.190.65 srl2 <none> <none>
metallb-system pod/speaker-qsqnc 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
metallb-system pod/speaker-s4pp8 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d19h <none>
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 2d19h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
default daemonset.apps/routing-nodes 2 2 2 2 2 <none> 24m routing-container akazad1/srlanna:v2 role=routing
kube-system daemonset.apps/calico-node 2 2 2 2 2 kubernetes.io/os=linux 2d19h calico-node calico/node:v3.14.2 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 kubernetes.io/os=linux 2d19h kube-proxy k8s.gcr.io/kube-proxy:v1.21.3 k8s-app=kube-proxy
metallb-system daemonset.apps/speaker 2 2 2 2 2 kubernetes.io/os=linux 2d19h speaker quay.io/metallb/speaker:v0.10.2 app=metallb,component=speaker
However, when it is directly creating pods from the management-pod.yaml (given below), it is creating one as expected.
Why the dasemonset is creating two pods instead of one?
Code segment where it is supposed to create a daemonset of type routing node
for i in range(len(kinds)):
kind = kinds[i]
# Create should only be true when the DaemonSet is being created for the
# first time -- i.e., when this is called from create_cluster. After that,
# we can basically ignore this because the DaemonSet will take care of
# adding pods to created nodes.
if create:
fname = 'yaml/ds/%s-ds.yml' % kind
yml = util.load_yaml(fname, prefix)
for container in yml['spec']['template']['spec']['containers']:
env = container['env']
util.replace_yaml_val(env, 'ROUTING_IPS', route_str)
util.replace_yaml_val(env, 'ROUTE_ADDR', route_addr)
util.replace_yaml_val(env, 'SCHED_IPS', sched_str)
util.replace_yaml_val(env, 'FUNCTION_ADDR', function_addr)
util.replace_yaml_val(env, 'MON_IPS', mon_str)
util.replace_yaml_val(env, 'MGMT_IP', management_ip)
util.replace_yaml_val(env, 'SEED_IP', seed_ip)
apps_client.create_namespaced_daemon_set(namespace=util.NAMESPACE,
body=yml)
# Wait until all pods of this kind are running
res = []
while len(res) != expected_counts[i]:
res = util.get_pod_ips(client, 'role='+kind, is_running=True)
pods = client.list_namespaced_pod(namespace=util.NAMESPACE,
label_selector='role=' +
kind).items
created_pods = get_current_pod_container_pairs(pods)
I have removed the nodeSelector from all the yaml files as I am running it on bare-metal cluster.
1 routing-ds.yaml
14
15 apiVersion: apps/v1
16 kind: DaemonSet
17 metadata:
18 name: routing-nodes
19 labels:
20 role: routing
21 spec:
22 selector:
23 matchLabels:
24 role: routing
25 template:
26 metadata:
27 labels:
28 role: routing
29 spec:
30 #nodeSelector:
31 # role: routing
32
33 hostNetwork: true
34 containers:
35 - name: routing-container
36 image: akazad1/srlanna:v2
37 env:
38 - name: SERVER_TYPE
39 value: r
40 - name: MON_IPS
41 value: MON_IPS_DUMMY
42 - name: REPO_ORG
43 value: hydro-project
44 - name: REPO_BRANCH
45 value: master
2 management-pod.yaml
15 apiVersion: v1
16 kind: Pod
17 metadata:
18 name: management-pod
19 labels:
20 role: management
21 spec:
22 restartPolicy: Never
23 containers:
24 - name: management-container
25 image: hydroproject/management
26 env:
27 #- name: AWS_ACCESS_KEY_ID
28 #value: ACCESS_KEY_ID_DUMMY
29 #- name: AWS_SECRET_ACCESS_KEY
30 #value: SECRET_KEY_DUMMY
31 #- name: KOPS_STATE_STORE
32 # value: KOPS_BUCKET_DUMMY
33 - name: HYDRO_CLUSTER_NAME
34 value: CLUSTER_NAME
35 - name: REPO_ORG
36 value: hydro-project
37 - name: REPO_BRANCH
38 value: master
39 - name: ANNA_REPO_ORG
40 value: hydro-project
41 - name: ANNA_REPO_BRANCH
42 value: master
43 # nodeSelector:
44 #role: general

May you have misunderstanding you have to use the kind: deployment if you want to manage the replicas (PODs - 1,2,3...n) on Kubernetes.
Daemon set behavior is like it will run the POD on each available node in the cluster.
So inside your cluster, there are two nodes so daemon set will run the POD on each available node. If you will increase the Node deamon set will auto-create the POD on that node also.
kind: Pod
will create the single POD only which is its default behavior.
The following are some of the Kubernetes Objects:
pods
ReplicationController (Manages Pods)
Deployment (Manages Pods)
StatefulSets
DaemonSets
You can read more at : https://chkrishna.medium.com/kubernetes-objects-e0a8b93b5cdc
Official document : https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/
If you want to manage POD using any type of controller kind: deployment is best. you can scale the replicas and scale down. You can also mention the replicas in YAML 1,2,3 and that way it will be running on cluster based on number.

Related

No matches for kind "Ingress" in version "networking.k8s.io/v1"

I am getting an error message indicating that the Kubernetes resource defined in the file "argocdingress.yaml" is of type "Ingress", but the version of the Kubernetes API being used (networking.k8s.io/v1) does not have a resource type called "Ingress".*
error: unable to recognize "argocdingress.yaml": no matches for kind "Ingress" in version "networking.k8s.io/v1"
[root#uat-master01 yaml]# kubectl apply -f argocdingress.yaml
error: unable to recognize "argocdingress.yaml": no matches for kind "Ingress" in version "networking.k8s.io/v1"
argocdingress.yaml
apiVersion: networking.k8s.io/v1kind: Ingressmetadata:name: argocdingressnamespace: argocdannotations:nginx.ingress.kubernetes.io/rewrite-target: /spec:rules:
host: argocd.fonepay.comhttp:paths:
path: /pathType: Prefixpath: /pathType: Prefixbackend:your textservice:name: argocd-serverport:name: http
[root#uat-master01 yaml]# kubectl get all -n argocd
NAME READY STATUS RESTARTS AGEpod/argocd-redis-66b48966cb-v4vwg 1/1 Running 0 19hpod/argocd-repo-server-7d956f8689-bl5dr 1/1 Running 0 19hpod/argocd-server-598494dbc7-pd2g5 1/1 Running 0 19hpod/argocd-application-controller-0 1/1 Running 0 19hpod/argocd-dex-server-8b975c7cc-42b26 1/1 Running 0 19h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/argocd-applicationset-controller ClusterIP 10.43.157.126 <none> 7000/TCP,8080/TCP 19hservice/argocd-dex-server ClusterIP 10.43.120.182 <none> 5556/TCP,5557/TCP,5558/TCP 19hservice/argocd-metrics ClusterIP 10.43.202.235 <none> 8082/TCP 19hservice/argocd-notifications-controller-metrics ClusterIP 10.43.99.138 <none> 9001/TCP 19hservice/argocd-redis ClusterIP 10.43.118.67 <none> 6379/TCP 19hservice/argocd-repo-server ClusterIP 10.43.9.59 <none> 8081/TCP,8084/TCP 19hservice/argocd-server ClusterIP 10.43.251.211 <none> 80/TCP,443/TCP 19hservice/argocd-server-metrics ClusterIP 10.43.5.168 <none> 8083/TCP 19h
NAME READY UP-TO-DATE AVAILABLE AGEdeployment.apps/argocd-redis 1/1 1 1 19hdeployment.apps/argocd-repo-server 1/1 1 1 19hdeployment.apps/argocd-server 1/1 1 1 19hdeployment.apps/argocd-dex-server 1/1 1 1 19h
NAME DESIRED CURRENT READY AGEreplicaset.apps/argocd-redis-66b48966cb 1 1 1 19hreplicaset.apps/argocd-repo-server-7d956f8689 1 1 1 19hreplicaset.apps/argocd-server-598494dbc7 1 1 1 19hreplicaset.apps/argocd-dex-server-8b975c7cc 1 1 1 19h
NAME READY AGEstatefulset.apps/argocd-application-controller 1/1 19h
nodeInfo:architecture: amd64bootID: 031dadd6-5586-453e-8add-ade7591791b7containerRuntimeVersion: containerd://1.3.3-k3s2kernelVersion: 3.10.0-1160.81.1.el7.x86_64kubeProxyVersion: v1.18.9+k3s1kubeletVersion: v1.18.9+k3s1machineID: 091bbe3a46464c0bac4082cd351b8db4operatingSystem: linuxosImage: CentOS Linux 7 (Core)systemUUID: CADD0442-7251-AE77-6375-D151A94CF9A7
Check by updating
apiVersion: networking.k8s.io/v1

Kubernetes Dashboard CrashLoopBackOff: timeout error on Raspberry Pi cluster

Should be a simple task, I simply want to run the Kubernetes Dashboard on a clean install of Kubernetes on a Raspberry Pi cluster.
What I've done:
Setup the initial cluster (hostname, static ip, cgroup, swapspace, install and configure docker, install kubernetes, setup kubernetes network and join nodes)
I have flannel installed
I have applied the dashboard
Bunch of random testing trying to figure this out
Obviously, as seen below, the container in the dashboard pod is not working because it cannot access kubernetes-dashboard-csrf. I have no idea why this cannot be accessed, my only thought is that I missed a step when setting up the cluster. I've followed about 6 different guides without success, prioritizing the official guide. I have also seen quite a few people having the same or similar issues that most have not posted a resolution. Thanks!
Nodes: kubectl get nodes
NAME STATUS ROLES AGE VERSION
gus3 Ready <none> 346d v1.23.1
juliet3 Ready <none> 346d v1.23.1
shawn4 Ready <none> 346d v1.23.1
vick4 Ready control-plane,master 346d v1.23.1
All Pods: kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-74ff55c5b-7j2xg 1/1 Running 27 346d
kube-system coredns-74ff55c5b-cb2x8 1/1 Running 27 346d
kube-system etcd-vick4 1/1 Running 2 169m
kube-system kube-apiserver-vick4 1/1 Running 2 169m
kube-system kube-controller-manager-vick4 1/1 Running 2 169m
kube-system kube-flannel-ds-gclmp 1/1 Running 0 11m
kube-system kube-flannel-ds-hshjv 1/1 Running 0 12m
kube-system kube-flannel-ds-kdd4w 1/1 Running 0 11m
kube-system kube-flannel-ds-wzhkt 1/1 Running 0 10m
kube-system kube-proxy-4t25v 1/1 Running 26 346d
kube-system kube-proxy-b6vbx 1/1 Running 26 346d
kube-system kube-proxy-jgj4s 1/1 Running 27 346d
kube-system kube-proxy-n65sl 1/1 Running 26 346d
kube-system kube-scheduler-vick4 1/1 Running 2 169m
kubernetes-dashboard dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 77m
kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 18 77m
Resources: kubectl get all -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 79m
pod/kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 19 79m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 172.20.0.191 <none> 8000/TCP 79m
service/kubernetes-dashboard ClusterIP 172.20.0.15 <none> 443/TCP 79m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 79m
deployment.apps/kubernetes-dashboard 0/1 1 0 79m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-5b8896d7fc 1 1 1 79m
replicaset.apps/kubernetes-dashboard-897c7599f 1 1 0 79m
Notice CrashLoopBackOff
Pod Details: kubectl describe pods kubernetes-dashboard-897c7599f-qss5p -n kubernetes-dashboard
Name: kubernetes-dashboard-897c7599f-qss5p
Namespace: kubernetes-dashboard
Priority: 0
Node: shawn4/192.168.10.71
Start Time: Fri, 17 Dec 2021 18:52:15 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=897c7599f
Annotations: <none>
Status: Running
IP: 172.19.1.75
IPs:
IP: 172.19.1.75
Controlled By: ReplicaSet/kubernetes-dashboard-897c7599f
Containers:
kubernetes-dashboard:
Container ID: docker://894a354e40ca1a95885e149dcd75415e0f186ead3f2e05ec0787f4b1c7a29622
Image: kubernetesui/dashboard:v2.4.0
Image ID: docker-pullable://kubernetesui/dashboard#sha256:526850ae4ea9aba360e72b6df69fd3126b129d446efe83ac5250282b85f95b7f
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 17 Dec 2021 20:10:19 +0000
Finished: Fri, 17 Dec 2021 20:10:49 +0000
Ready: False
Restart Count: 19
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-wq9m8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-wq9m8:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-wq9m8
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 21s (x327 over 79m) kubelet Back-off restarting failed container
Logs: kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p
2021/12/17 20:10:19 Starting overwatch
2021/12/17 20:10:19 Using namespace: kubernetes-dashboard
2021/12/17 20:10:19 Using in-cluster config to connect to apiserver
2021/12/17 20:10:19 Using secret token for csrf signing
2021/12/17 20:10:19 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get "https://172.20.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf": dial tcp 172.20.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0x400055fae8)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:41 +0x350
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:66
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:502 +0x8c
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:470 +0x40
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:551
main.main()
/home/runner/work/dashboard/dashboard/src/app/backend/dashboard.go:95 +0x1dc
If you need any more information please ask!
UPDATE 12/29/21:
Fixed this issue by reinstalling the cluster to the newest versions of Kubernetes and Ubuntu.
Turned out there were several issues:
I was using Ubuntu Buster which is deprecated.
My client/server Kubernetes versions were +/-0.3 out of sync
I was following outdated instructions
I reinstalled the whole cluster following Kubernetes official guide and, with a few snags along the way, it works!

Why can't find osd pod in kubernetes after deploying rook-ceph?

Tried to install rook-ceph on kubernetes as this guide:
https://rook.io/docs/rook/v1.3/ceph-quickstart.html
git clone --single-branch --branch release-1.3 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster.yaml
When I check all the pods
$ kubectl -n rook-ceph get pod
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-9c2z9 3/3 Running 0 23m
csi-cephfsplugin-provisioner-7678bcfc46-s67hq 5/5 Running 0 23m
csi-cephfsplugin-provisioner-7678bcfc46-sfljd 5/5 Running 0 23m
csi-cephfsplugin-smmlf 3/3 Running 0 23m
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq 6/6 Running 0 23m
csi-rbdplugin-provisioner-fbd45b7c8-rp85z 6/6 Running 0 23m
csi-rbdplugin-s67lw 3/3 Running 0 23m
csi-rbdplugin-zq4k5 3/3 Running 0 23m
rook-ceph-mon-a-canary-954dc5cd9-5q8tk 1/1 Running 0 2m9s
rook-ceph-mon-b-canary-b9d6f5594-mcqwc 1/1 Running 0 2m9s
rook-ceph-mon-c-canary-78b48dbfb7-z2t7d 0/1 Pending 0 2m8s
rook-ceph-operator-757d6db48d-x27lm 1/1 Running 0 25m
rook-ceph-tools-75f575489-znbbz 1/1 Running 0 7m45s
rook-discover-gq489 1/1 Running 0 24m
rook-discover-p9zlg 1/1 Running 0 24m
$ kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare
No resources found in rook-ceph namespace.
Do some other operation
$ kubectl taint nodes $(hostname) node-role.kubernetes.io/master:NoSchedule-
$ kubectl -n rook-ceph-system delete pods rook-ceph-operator-757d6db48d-x27lm
Create file system
$ kubectl create -f filesystem.yaml
Check again
$ kubectl get pods -n rook-ceph -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-9c2z9 3/3 Running 0 135m 192.168.0.53 kube3 <none> <none>
csi-cephfsplugin-provisioner-7678bcfc46-s67hq 5/5 Running 0 135m 10.1.2.6 kube3 <none> <none>
csi-cephfsplugin-provisioner-7678bcfc46-sfljd 5/5 Running 0 135m 10.1.2.5 kube3 <none> <none>
csi-cephfsplugin-smmlf 3/3 Running 0 135m 192.168.0.52 kube2 <none> <none>
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq 6/6 Running 0 135m 10.1.1.6 kube2 <none> <none>
csi-rbdplugin-provisioner-fbd45b7c8-rp85z 6/6 Running 0 135m 10.1.1.5 kube2 <none> <none>
csi-rbdplugin-s67lw 3/3 Running 0 135m 192.168.0.52 kube2 <none> <none>
csi-rbdplugin-zq4k5 3/3 Running 0 135m 192.168.0.53 kube3 <none> <none>
rook-ceph-crashcollector-kube2-6d95bb9c-r5w7p 0/1 Init:0/2 0 110m <none> kube2 <none> <none>
rook-ceph-crashcollector-kube3-644c849bdb-9hcvg 0/1 Init:0/2 0 110m <none> kube3 <none> <none>
rook-ceph-mon-a-canary-954dc5cd9-6ccbh 1/1 Running 0 75s 10.1.2.130 kube3 <none> <none>
rook-ceph-mon-b-canary-b9d6f5594-k85w5 1/1 Running 0 74s 10.1.1.74 kube2 <none> <none>
rook-ceph-mon-c-canary-78b48dbfb7-kfzzx 0/1 Pending 0 73s <none> <none> <none> <none>
rook-ceph-operator-757d6db48d-nlh84 1/1 Running 0 110m 10.1.2.28 kube3 <none> <none>
rook-ceph-tools-75f575489-znbbz 1/1 Running 0 119m 10.1.1.14 kube2 <none> <none>
rook-discover-gq489 1/1 Running 0 135m 10.1.1.3 kube2 <none> <none>
rook-discover-p9zlg 1/1 Running 0 135m 10.1.2.4 kube3 <none> <none>
Can't see pod as rook-ceph-osd-.
And rook-ceph-mon-c-canary-78b48dbfb7-kfzzx pod is always Pending.
If install toolbox as
https://rook.io/docs/rook/v1.3/ceph-toolbox.html
$ kubectl create -f toolbox.yaml
$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
Inside the container, check the ceph status
[root#rook-ceph-tools-75f575489-znbbz /]# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
[errno 2] error connecting to the cluster
It's running on Ubuntu 16.04.6.
Deploy again
$ kubectl -n rook-ceph get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-4tww8 3/3 Running 0 3m38s 192.168.0.52 kube2 <none> <none>
csi-cephfsplugin-dbbfb 3/3 Running 0 3m38s 192.168.0.53 kube3 <none> <none>
csi-cephfsplugin-provisioner-7678bcfc46-8kt96 5/5 Running 0 3m37s 10.1.2.6 kube3 <none> <none>
csi-cephfsplugin-provisioner-7678bcfc46-kq6vv 5/5 Running 0 3m38s 10.1.1.6 kube2 <none> <none>
csi-rbdplugin-4qrqn 3/3 Running 0 3m39s 192.168.0.53 kube3 <none> <none>
csi-rbdplugin-dqx9z 3/3 Running 0 3m39s 192.168.0.52 kube2 <none> <none>
csi-rbdplugin-provisioner-fbd45b7c8-7f57t 6/6 Running 0 3m39s 10.1.2.5 kube3 <none> <none>
csi-rbdplugin-provisioner-fbd45b7c8-9zwhb 6/6 Running 0 3m39s 10.1.1.5 kube2 <none> <none>
rook-ceph-mon-a-canary-954dc5cd9-rgqpg 1/1 Running 0 2m40s 10.1.1.7 kube2 <none> <none>
rook-ceph-mon-b-canary-b9d6f5594-n2pwc 1/1 Running 0 2m35s 10.1.2.8 kube3 <none> <none>
rook-ceph-mon-c-canary-78b48dbfb7-fv46f 0/1 Pending 0 2m30s <none> <none> <none> <none>
rook-ceph-operator-757d6db48d-2m25g 1/1 Running 0 6m27s 10.1.2.3 kube3 <none> <none>
rook-discover-lpsht 1/1 Running 0 5m15s 10.1.1.3 kube2 <none> <none>
rook-discover-v4l77 1/1 Running 0 5m15s 10.1.2.4 kube3 <none> <none>
Describe pending pod
$ kubectl describe pod rook-ceph-mon-c-canary-78b48dbfb7-fv46f -n rook-ceph
Name: rook-ceph-mon-c-canary-78b48dbfb7-fv46f
Namespace: rook-ceph
Priority: 0
Node: <none>
Labels: app=rook-ceph-mon
ceph_daemon_id=c
mon=c
mon_canary=true
mon_cluster=rook-ceph
pod-template-hash=78b48dbfb7
rook_cluster=rook-ceph
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/rook-ceph-mon-c-canary-78b48dbfb7
Containers:
mon:
Image: rook/ceph:v1.3.4
Port: 6789/TCP
Host Port: 0/TCP
Command:
/tini
Args:
--
sleep
3600
Environment:
CONTAINER_IMAGE: ceph/ceph:v14.2.9
POD_NAME: rook-ceph-mon-c-canary-78b48dbfb7-fv46f (v1:metadata.name)
POD_NAMESPACE: rook-ceph (v1:metadata.namespace)
NODE_NAME: (v1:spec.nodeName)
POD_MEMORY_LIMIT: node allocatable (limits.memory)
POD_MEMORY_REQUEST: 0 (requests.memory)
POD_CPU_LIMIT: node allocatable (limits.cpu)
POD_CPU_REQUEST: 0 (requests.cpu)
ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false
ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false
ROOK_POD_IP: (v1:status.podIP)
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/mon/ceph-c from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-65xtn (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
rook-config-override:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rook-config-override
Optional: false
rook-ceph-mons-keyring:
Type: Secret (a volume populated by a Secret)
SecretName: rook-ceph-mons-keyring
Optional: false
rook-ceph-log:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/rook-ceph/log
HostPathType:
rook-ceph-crash:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/rook-ceph/crash
HostPathType:
ceph-daemon-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/mon-c/data
HostPathType:
default-token-65xtn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-65xtn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22s (x3 over 84s) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules.
Test mount
Create a nginx.yaml file
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
flexVolume:
driver: ceph.rook.io/rook
fsType: ceph
options:
fsName: myfs
clusterNamespace: rook-ceph
Deploy it and describe the pod detail
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m28s default-scheduler Successfully assigned default/nginx to kube2
Warning FailedMount 9m28s kubelet, kube2 Unable to attach or mount volumes: unmounted volumes=[www default-token-fnb28], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
Warning FailedMount 6m14s (x2 over 6m38s) kubelet, kube2 Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[default-token-fnb28 www]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
Warning FailedMount 4m6s (x23 over 9m13s) kubelet, kube2 Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
rook-ceph-mon-x pods have following affinity:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: rook-ceph-mon
topologyKey: kubernetes.io/hostname
which doesn't allow for running 2 rook-ceph-mon pods on the same node.
Since you seem to have 3 nodes: 1 master and 2 workers, 2 pods get created, one on kube2 and one on kube3 node. kube1 is master node tainted as unschedulable so rook-ceph-mon-c cannot be scheduled there.
To solve it you can:
add one more worker node
remove NoSchedule taint with kubectl taint nodes kube1 key:NoSchedule-
change mon count to lower value

Horizontal Pod Autoscaler replicas based on the amount of the nodes in the cluster

Im looking for the solution that will scale out pods automatically when the nodes join the cluster and scale in back when the nodes are deleted.
We are running WebApp on the nodes and this require graceful pod eviction/termination when the node is scheduled to be disconnected.
I was checking the option of using the DaemonSet but since we are using Kops for the cluster rolling update it ignores DaemonSets evictions (flag "--ignore-daemionset" is not supported).
As a result the WebApp "dies" with the node which is not acceptable for our application.
The ability of HorizontalPodAutoscaler to overwrite the amount of replicas which are set in the deployment yaml could solve the problem.
I want to find the way to change the min/maxReplicas in HorizontalPodAutoscaler yaml dynamically based on the amount of nodes in the cluster.
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: MyWebApp
minReplicas: "Num of nodes in the cluster"
maxReplicas: "Num of nodes in the cluster"
Any ideas how to get the number of nodes and update HorizontalPodAutoscaler yaml in the cluster accordingly? Or any other solutions for the problem?
Have you tried usage of nodeSelector spec in daemonset yaml.
So if you have nodeselector set in yaml and just before drain if you remove the nodeselector label value from the node the daemonset should scale down gracefully also same when you add new node to cluster label the node with custom value and deamonset will scale up.
This works for me so you can try this and confirm with Kops
First : Label all you nodes with a custom label you will always have on your cluster
Example:
kubectl label nodes k8s-master-1 mylabel=allow_demon_set
kubectl label nodes k8s-node-1 mylabel=allow_demon_set
kubectl label nodes k8s-node-2 mylabel=allow_demon_set
kubectl label nodes k8s-node-3 mylabel=allow_demon_set
Then to your daemon set yaml add node selector.
Example.yaml used as below : Note added nodeselctor field
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
nodeSelector:
mylabel: allow_demon_set
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
So nodes are labeled as below
$ kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master-1 Ready master 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master-1,kubernetes.io/os=linux,mylable=allow_demon_set,node-role.kubernetes.io/master=
k8s-node-1 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,mylable=allow_demon_set
k8s-node-2 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,mylable=allow_demon_set
k8s-node-3 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-3,kubernetes.io/os=linux,mylable=allow_demon_set
Once you have correct yaml start the daemon set using it
$ kubectl create -f Example.yaml
$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-jrgl6 1/1 Running 0 20s 10.244.3.19 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 20s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 20s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 20s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 4 4 4 4 4 mylable=allow_demon_set 20s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Then before draining a node we can just remove the custom label from node and the pod-should scale down gracefully and then drain the node.
$ kubectl label nodes k8s-node-3 mylabel-
Check the daemonset and it should scale down
ubuntu#k8s-kube-client:~$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-jrgl6 0/1 Terminating 0 2m36s 10.244.3.19 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 2m36s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 2m36s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 2m36s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 3 3 3 3 3 mylable=allow_demon_set 2m36s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Now again add the label to new node with same custom label when it is added to cluster and the deamonset will scale up
$ kubectl label nodes k8s-node-3 mylable=allow_demon_set
ubuntu#k8s-kube-client:~$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-22rsj 1/1 Running 0 2s 10.244.3.20 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 5m28s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 5m28s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 5m28s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 4 4 4 4 4 mylable=allow_demon_set 5m28s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Kindly confirm if this what you want to do and works with kops

kubernetes metrics server don't start

I try to connect in the dashboard of kubernetes.
I have the latest version of kubernetes v1.12 with kubeadm , in a server.
I download from github the metrics-server and run:
Kubctl create -f deploy/1.8+
but i get this error
kube-system metrics-server-5cbbc84f8c-tjfxd 0/1 Pending 0 12m
with out log to debug
error: the server doesn't have a resource type "logs"
I don't want to install heapster because is DEPRECATED.
UPDATE
Hello, and thanks.
i run the taint command i get:
error: at least one taint update is required
and the command
kubectl describe deployment metrics-server -n kube-system
i get this output:
Name: metrics-server
Namespace: kube-system
CreationTimestamp: Thu, 18 Oct 2018 14:34:42 +0000
Labels: k8s-app=metrics-server
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata": {"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"metrics-...
Selector: k8s-app=metrics-server
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: k8s-app=metrics-server
Service Account: metrics-server
Containers:
metrics-server:
Image: k8s.gcr.io/metrics-server-amd64:v0.3.1
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: metrics-server-5cbbc84f8c (1/1 replicas created)
Events: <none>
Command:
kubectl get nodes
The output for this is just the IP of the node, and nothing special.
Any ideas, or what to do to work the dashboard for kubernetes.
I suppose you are trying setup metrics-server on your master node.
If you issue kubectl describe deployment metrics-server -n kube-system I believe you will see something like this:
Name: metrics-server Namespace:
kube-system CreationTimestamp: Thu, 18 Oct 2018 15:57:34 +0000
Labels: k8s-app=metrics-server Annotations:
deployment.kubernetes.io/revision: 1 Selector:
k8s-app=metrics-server Replicas: 1 desired | 1 updated |
1 total | 0 available | 1 unavailable
But if you will describe your node you will see taint that prevent you from scheduling new pods on master node:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-master-1 Ready master 17m v1.12.1
kubectl describe node kube-master-1
Name: kube-master-1
...
Taints: node-role.kubernetes.io/master:NoSchedule
You have to remove this taint:
kubectl taint node kube-master-1 node-role.kubernetes.io/master:NoSchedule-
node/kube-master-1 untainted
Result:
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-xvc77 2/2 Running 0 20m
kube-system coredns-576cbf47c7-rj4wh 1/1 Running 0 21m
kube-system coredns-576cbf47c7-vsjsf 1/1 Running 0 21m
kube-system etcd-kube-master-1 1/1 Running 0 20m
kube-system kube-apiserver-kube-master-1 1/1 Running 0 20m
kube-system kube-controller-manager-kube-master-1 1/1 Running 0 20m
kube-system kube-proxy-xp5zh 1/1 Running 0 21m
kube-system kube-scheduler-kube-master-1 1/1 Running 0 20m
kube-system metrics-server-5cbbc84f8c-l2t76 1/1 Running 0 18m
But this is not the best approach. Good approach is to join worker and set up metrics-server there. There won't be any issues and there is no need to touch taint on master node.
Hope it will help you.
The above answer by "Vit" is correct, either remove taint from existing node group or create new node group without any taint.