Tried to install rook-ceph on kubernetes as this guide:
https://rook.io/docs/rook/v1.3/ceph-quickstart.html
git clone --single-branch --branch release-1.3 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster.yaml
When I check all the pods
$ kubectl -n rook-ceph get pod
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-9c2z9 3/3 Running 0 23m
csi-cephfsplugin-provisioner-7678bcfc46-s67hq 5/5 Running 0 23m
csi-cephfsplugin-provisioner-7678bcfc46-sfljd 5/5 Running 0 23m
csi-cephfsplugin-smmlf 3/3 Running 0 23m
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq 6/6 Running 0 23m
csi-rbdplugin-provisioner-fbd45b7c8-rp85z 6/6 Running 0 23m
csi-rbdplugin-s67lw 3/3 Running 0 23m
csi-rbdplugin-zq4k5 3/3 Running 0 23m
rook-ceph-mon-a-canary-954dc5cd9-5q8tk 1/1 Running 0 2m9s
rook-ceph-mon-b-canary-b9d6f5594-mcqwc 1/1 Running 0 2m9s
rook-ceph-mon-c-canary-78b48dbfb7-z2t7d 0/1 Pending 0 2m8s
rook-ceph-operator-757d6db48d-x27lm 1/1 Running 0 25m
rook-ceph-tools-75f575489-znbbz 1/1 Running 0 7m45s
rook-discover-gq489 1/1 Running 0 24m
rook-discover-p9zlg 1/1 Running 0 24m
$ kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare
No resources found in rook-ceph namespace.
Do some other operation
$ kubectl taint nodes $(hostname) node-role.kubernetes.io/master:NoSchedule-
$ kubectl -n rook-ceph-system delete pods rook-ceph-operator-757d6db48d-x27lm
Create file system
$ kubectl create -f filesystem.yaml
Check again
$ kubectl get pods -n rook-ceph -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-9c2z9 3/3 Running 0 135m 192.168.0.53 kube3 <none> <none>
csi-cephfsplugin-provisioner-7678bcfc46-s67hq 5/5 Running 0 135m 10.1.2.6 kube3 <none> <none>
csi-cephfsplugin-provisioner-7678bcfc46-sfljd 5/5 Running 0 135m 10.1.2.5 kube3 <none> <none>
csi-cephfsplugin-smmlf 3/3 Running 0 135m 192.168.0.52 kube2 <none> <none>
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq 6/6 Running 0 135m 10.1.1.6 kube2 <none> <none>
csi-rbdplugin-provisioner-fbd45b7c8-rp85z 6/6 Running 0 135m 10.1.1.5 kube2 <none> <none>
csi-rbdplugin-s67lw 3/3 Running 0 135m 192.168.0.52 kube2 <none> <none>
csi-rbdplugin-zq4k5 3/3 Running 0 135m 192.168.0.53 kube3 <none> <none>
rook-ceph-crashcollector-kube2-6d95bb9c-r5w7p 0/1 Init:0/2 0 110m <none> kube2 <none> <none>
rook-ceph-crashcollector-kube3-644c849bdb-9hcvg 0/1 Init:0/2 0 110m <none> kube3 <none> <none>
rook-ceph-mon-a-canary-954dc5cd9-6ccbh 1/1 Running 0 75s 10.1.2.130 kube3 <none> <none>
rook-ceph-mon-b-canary-b9d6f5594-k85w5 1/1 Running 0 74s 10.1.1.74 kube2 <none> <none>
rook-ceph-mon-c-canary-78b48dbfb7-kfzzx 0/1 Pending 0 73s <none> <none> <none> <none>
rook-ceph-operator-757d6db48d-nlh84 1/1 Running 0 110m 10.1.2.28 kube3 <none> <none>
rook-ceph-tools-75f575489-znbbz 1/1 Running 0 119m 10.1.1.14 kube2 <none> <none>
rook-discover-gq489 1/1 Running 0 135m 10.1.1.3 kube2 <none> <none>
rook-discover-p9zlg 1/1 Running 0 135m 10.1.2.4 kube3 <none> <none>
Can't see pod as rook-ceph-osd-.
And rook-ceph-mon-c-canary-78b48dbfb7-kfzzx pod is always Pending.
If install toolbox as
https://rook.io/docs/rook/v1.3/ceph-toolbox.html
$ kubectl create -f toolbox.yaml
$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
Inside the container, check the ceph status
[root#rook-ceph-tools-75f575489-znbbz /]# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
[errno 2] error connecting to the cluster
It's running on Ubuntu 16.04.6.
Deploy again
$ kubectl -n rook-ceph get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-4tww8 3/3 Running 0 3m38s 192.168.0.52 kube2 <none> <none>
csi-cephfsplugin-dbbfb 3/3 Running 0 3m38s 192.168.0.53 kube3 <none> <none>
csi-cephfsplugin-provisioner-7678bcfc46-8kt96 5/5 Running 0 3m37s 10.1.2.6 kube3 <none> <none>
csi-cephfsplugin-provisioner-7678bcfc46-kq6vv 5/5 Running 0 3m38s 10.1.1.6 kube2 <none> <none>
csi-rbdplugin-4qrqn 3/3 Running 0 3m39s 192.168.0.53 kube3 <none> <none>
csi-rbdplugin-dqx9z 3/3 Running 0 3m39s 192.168.0.52 kube2 <none> <none>
csi-rbdplugin-provisioner-fbd45b7c8-7f57t 6/6 Running 0 3m39s 10.1.2.5 kube3 <none> <none>
csi-rbdplugin-provisioner-fbd45b7c8-9zwhb 6/6 Running 0 3m39s 10.1.1.5 kube2 <none> <none>
rook-ceph-mon-a-canary-954dc5cd9-rgqpg 1/1 Running 0 2m40s 10.1.1.7 kube2 <none> <none>
rook-ceph-mon-b-canary-b9d6f5594-n2pwc 1/1 Running 0 2m35s 10.1.2.8 kube3 <none> <none>
rook-ceph-mon-c-canary-78b48dbfb7-fv46f 0/1 Pending 0 2m30s <none> <none> <none> <none>
rook-ceph-operator-757d6db48d-2m25g 1/1 Running 0 6m27s 10.1.2.3 kube3 <none> <none>
rook-discover-lpsht 1/1 Running 0 5m15s 10.1.1.3 kube2 <none> <none>
rook-discover-v4l77 1/1 Running 0 5m15s 10.1.2.4 kube3 <none> <none>
Describe pending pod
$ kubectl describe pod rook-ceph-mon-c-canary-78b48dbfb7-fv46f -n rook-ceph
Name: rook-ceph-mon-c-canary-78b48dbfb7-fv46f
Namespace: rook-ceph
Priority: 0
Node: <none>
Labels: app=rook-ceph-mon
ceph_daemon_id=c
mon=c
mon_canary=true
mon_cluster=rook-ceph
pod-template-hash=78b48dbfb7
rook_cluster=rook-ceph
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/rook-ceph-mon-c-canary-78b48dbfb7
Containers:
mon:
Image: rook/ceph:v1.3.4
Port: 6789/TCP
Host Port: 0/TCP
Command:
/tini
Args:
--
sleep
3600
Environment:
CONTAINER_IMAGE: ceph/ceph:v14.2.9
POD_NAME: rook-ceph-mon-c-canary-78b48dbfb7-fv46f (v1:metadata.name)
POD_NAMESPACE: rook-ceph (v1:metadata.namespace)
NODE_NAME: (v1:spec.nodeName)
POD_MEMORY_LIMIT: node allocatable (limits.memory)
POD_MEMORY_REQUEST: 0 (requests.memory)
POD_CPU_LIMIT: node allocatable (limits.cpu)
POD_CPU_REQUEST: 0 (requests.cpu)
ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false
ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false
ROOK_POD_IP: (v1:status.podIP)
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/mon/ceph-c from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-65xtn (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
rook-config-override:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rook-config-override
Optional: false
rook-ceph-mons-keyring:
Type: Secret (a volume populated by a Secret)
SecretName: rook-ceph-mons-keyring
Optional: false
rook-ceph-log:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/rook-ceph/log
HostPathType:
rook-ceph-crash:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/rook-ceph/crash
HostPathType:
ceph-daemon-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/mon-c/data
HostPathType:
default-token-65xtn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-65xtn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22s (x3 over 84s) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules.
Test mount
Create a nginx.yaml file
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
flexVolume:
driver: ceph.rook.io/rook
fsType: ceph
options:
fsName: myfs
clusterNamespace: rook-ceph
Deploy it and describe the pod detail
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m28s default-scheduler Successfully assigned default/nginx to kube2
Warning FailedMount 9m28s kubelet, kube2 Unable to attach or mount volumes: unmounted volumes=[www default-token-fnb28], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
Warning FailedMount 6m14s (x2 over 6m38s) kubelet, kube2 Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[default-token-fnb28 www]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
Warning FailedMount 4m6s (x23 over 9m13s) kubelet, kube2 Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
rook-ceph-mon-x pods have following affinity:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: rook-ceph-mon
topologyKey: kubernetes.io/hostname
which doesn't allow for running 2 rook-ceph-mon pods on the same node.
Since you seem to have 3 nodes: 1 master and 2 workers, 2 pods get created, one on kube2 and one on kube3 node. kube1 is master node tainted as unschedulable so rook-ceph-mon-c cannot be scheduled there.
To solve it you can:
add one more worker node
remove NoSchedule taint with kubectl taint nodes kube1 key:NoSchedule-
change mon count to lower value
Related
Should be a simple task, I simply want to run the Kubernetes Dashboard on a clean install of Kubernetes on a Raspberry Pi cluster.
What I've done:
Setup the initial cluster (hostname, static ip, cgroup, swapspace, install and configure docker, install kubernetes, setup kubernetes network and join nodes)
I have flannel installed
I have applied the dashboard
Bunch of random testing trying to figure this out
Obviously, as seen below, the container in the dashboard pod is not working because it cannot access kubernetes-dashboard-csrf. I have no idea why this cannot be accessed, my only thought is that I missed a step when setting up the cluster. I've followed about 6 different guides without success, prioritizing the official guide. I have also seen quite a few people having the same or similar issues that most have not posted a resolution. Thanks!
Nodes: kubectl get nodes
NAME STATUS ROLES AGE VERSION
gus3 Ready <none> 346d v1.23.1
juliet3 Ready <none> 346d v1.23.1
shawn4 Ready <none> 346d v1.23.1
vick4 Ready control-plane,master 346d v1.23.1
All Pods: kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-74ff55c5b-7j2xg 1/1 Running 27 346d
kube-system coredns-74ff55c5b-cb2x8 1/1 Running 27 346d
kube-system etcd-vick4 1/1 Running 2 169m
kube-system kube-apiserver-vick4 1/1 Running 2 169m
kube-system kube-controller-manager-vick4 1/1 Running 2 169m
kube-system kube-flannel-ds-gclmp 1/1 Running 0 11m
kube-system kube-flannel-ds-hshjv 1/1 Running 0 12m
kube-system kube-flannel-ds-kdd4w 1/1 Running 0 11m
kube-system kube-flannel-ds-wzhkt 1/1 Running 0 10m
kube-system kube-proxy-4t25v 1/1 Running 26 346d
kube-system kube-proxy-b6vbx 1/1 Running 26 346d
kube-system kube-proxy-jgj4s 1/1 Running 27 346d
kube-system kube-proxy-n65sl 1/1 Running 26 346d
kube-system kube-scheduler-vick4 1/1 Running 2 169m
kubernetes-dashboard dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 77m
kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 18 77m
Resources: kubectl get all -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 79m
pod/kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 19 79m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 172.20.0.191 <none> 8000/TCP 79m
service/kubernetes-dashboard ClusterIP 172.20.0.15 <none> 443/TCP 79m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 79m
deployment.apps/kubernetes-dashboard 0/1 1 0 79m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-5b8896d7fc 1 1 1 79m
replicaset.apps/kubernetes-dashboard-897c7599f 1 1 0 79m
Notice CrashLoopBackOff
Pod Details: kubectl describe pods kubernetes-dashboard-897c7599f-qss5p -n kubernetes-dashboard
Name: kubernetes-dashboard-897c7599f-qss5p
Namespace: kubernetes-dashboard
Priority: 0
Node: shawn4/192.168.10.71
Start Time: Fri, 17 Dec 2021 18:52:15 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=897c7599f
Annotations: <none>
Status: Running
IP: 172.19.1.75
IPs:
IP: 172.19.1.75
Controlled By: ReplicaSet/kubernetes-dashboard-897c7599f
Containers:
kubernetes-dashboard:
Container ID: docker://894a354e40ca1a95885e149dcd75415e0f186ead3f2e05ec0787f4b1c7a29622
Image: kubernetesui/dashboard:v2.4.0
Image ID: docker-pullable://kubernetesui/dashboard#sha256:526850ae4ea9aba360e72b6df69fd3126b129d446efe83ac5250282b85f95b7f
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 17 Dec 2021 20:10:19 +0000
Finished: Fri, 17 Dec 2021 20:10:49 +0000
Ready: False
Restart Count: 19
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-wq9m8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-wq9m8:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-wq9m8
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 21s (x327 over 79m) kubelet Back-off restarting failed container
Logs: kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p
2021/12/17 20:10:19 Starting overwatch
2021/12/17 20:10:19 Using namespace: kubernetes-dashboard
2021/12/17 20:10:19 Using in-cluster config to connect to apiserver
2021/12/17 20:10:19 Using secret token for csrf signing
2021/12/17 20:10:19 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get "https://172.20.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf": dial tcp 172.20.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0x400055fae8)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:41 +0x350
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:66
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:502 +0x8c
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:470 +0x40
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:551
main.main()
/home/runner/work/dashboard/dashboard/src/app/backend/dashboard.go:95 +0x1dc
If you need any more information please ask!
UPDATE 12/29/21:
Fixed this issue by reinstalling the cluster to the newest versions of Kubernetes and Ubuntu.
Turned out there were several issues:
I was using Ubuntu Buster which is deprecated.
My client/server Kubernetes versions were +/-0.3 out of sync
I was following outdated instructions
I reinstalled the whole cluster following Kubernetes official guide and, with a few snags along the way, it works!
I have a metalLB loadbalancer, k8s clusters (one master and one worker) v1.18.5, helm 3.7, and nfs dynamic volume provisioning using helm. I run up a jupyterhub instance with helm. Within a minute everything is set up but when I use the external IP to open JupyterHub on my browser, noting loads up. here is my kubectl get all
pod/continuous-image-puller-4l5gj 1/1 Running 0 23s
pod/hub-6c9cb48df8-k5t4w 1/1 Running 0 23s
pod/nfs-subdir-external-provisioner-789697969b-hqp46 1/1 Running 0 23h
pod/nginx2-669c86457c-hc5mv 1/1 Running 0 35h
pod/proxy-66cb767659-svwbv 1/1 Running 0 23s
pod/user-scheduler-6d4698dd59-wqw9l 1/1 Running 0 23s
pod/user-scheduler-6d4698dd59-zk4c7 1/1 Running 0 23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hub ClusterIP 10.111.196.55 <none> 8081/TCP 23s
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 39h
service/nginx2 LoadBalancer 10.106.241.85 10.0.3.240 80:30746/TCP 32h
service/proxy-api ClusterIP 10.109.211.71 <none> 8001/TCP 23s
service/proxy-public LoadBalancer 10.111.233.85 10.0.3.241 80:31336/TCP 23s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/continuous-image-puller 1 1 1 1 1 <none> 23s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hub 1/1 1 1 23s
deployment.apps/nfs-subdir-external-provisioner 1/1 1 1 23h
deployment.apps/nginx2 1/1 1 1 35h
deployment.apps/proxy 1/1 1 1 23s
deployment.apps/user-scheduler 2/2 2 2 23s
NAME DESIRED CURRENT READY AGE
replicaset.apps/hub-6c9cb48df8 1 1 1 23s
replicaset.apps/nfs-subdir-external-provisioner-789697969b 1 1 1 23h
replicaset.apps/nginx2-669c86457c 1 1 1 35h
replicaset.apps/proxy-66cb767659 1 1 1 23s
replicaset.apps/user-scheduler-6d4698dd59 2 2 2 23s
NAME READY AGE
statefulset.apps/user-placeholder 0/0 23s
Also, below is my storage class for reference: kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 23h
I will not paste the config file as it is very large, basically what I did was
helm show values jupyterhub/jupyterhub > /tmp/jupyterhub.yaml
(after changing some values)
helm install jupyterhub jupyterhub/jupyterhub --values /tmp/jupyterhub.yaml
The only few things I changed was the security-key (hex [as mentioned on the website]) along with writing nfs-client wherever it said storageClass and storageClassName and perhaps altering the storage size (1Gi/2Gi). That's all. The LoadBalancer works fine because I ran nginx and I can easily open it up on my browser. So I decided to check the JupyterHub pod's by first getting the pod's name using: kubectl get pods
NAME READY STATUS RESTARTS AGE
continuous-image-puller-4l5gj 1/1 Running 0 20m
hub-6c9cb48df8-k5t4w 1/1 Running 0 20m
nfs-subdir-external-provisioner-789697969b-hqp46 1/1 Running 0 23h
nginx2-669c86457c-hc5mv 1/1 Running 0 35h
proxy-66cb767659-svwbv 1/1 Running 0 20m
user-scheduler-6d4698dd59-wqw9l 1/1 Running 0 20m
user-scheduler-6d4698dd59-zk4c7 1/1 Running 0 20m
root#master:/home/ubuntu#
and then using kubectl describe pod hub-6c9cb48df8-k5t4w -n default which gave me this:
Name: hub-6c9cb48df8-k5t4w
Namespace: default
Priority: 0
Node: worker/10.0.0.126
Start Time: Sat, 27 Nov 2021 10:21:43 +0000
Labels: app=jupyterhub
component=hub
hub.jupyter.org/network-access-proxy-api=true
hub.jupyter.org/network-access-proxy-http=true
hub.jupyter.org/network-access-singleuser=true
pod-template-hash=6c9cb48df8
release=jupyterhub
Annotations: checksum/config-map: f746d7e563a064e9158fe6f7f59bdbd463ed24ad7a927d75a1f18c022c3afeaf
checksum/secret: 926186a1b18e5cb9aa5b8c0a177f379299bcf0f05ac4de17d1958422054d15e5
cni.projectcalico.org/podIP: 192.168.171.97/32
cni.projectcalico.org/podIPs: 192.168.171.97/32
Status: Running
IP: 192.168.171.97
IPs:
IP: 192.168.171.97
Controlled By: ReplicaSet/hub-6c9cb48df8
Containers:
hub:
Container ID: docker://1d5e3a812f9712f6d59c09d855b034e2f6bc3e058bad4932db87145ec09f70d1
Image: jupyterhub/k8s-hub:1.2.0
Image ID: docker-pullable://jupyterhub/k8s-hub#sha256:e4770285aaf7230b930643986221757c2cc2e9420f5e21ac892582c96a57ce1c
Port: 8081/TCP
Host Port: 0/TCP
Args:
jupyterhub
--config
/usr/local/etc/jupyterhub/jupyterhub_config.py
--upgrade-db
State: Running
Started: Sat, 27 Nov 2021 10:21:45 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:http/hub/health delay=300s timeout=3s period=10s #success=1 #failure=30
Readiness: http-get http://:http/hub/health delay=0s timeout=1s period=2s #success=1 #failure=1000
Environment:
PYTHONUNBUFFERED: 1
HELM_RELEASE_NAME: jupyterhub
POD_NAMESPACE: default (v1:metadata.namespace)
CONFIGPROXY_AUTH_TOKEN: <set to the key 'hub.config.ConfigurableHTTPProxy.auth_token' in secret 'hub'> Optional: false
Mounts:
/srv/jupyterhub from pvc (rw)
/usr/local/etc/jupyterhub/config/ from config (rw)
/usr/local/etc/jupyterhub/jupyterhub_config.py from config (rw,path="jupyterhub_config.py")
/usr/local/etc/jupyterhub/secret/ from secret (rw)
/usr/local/etc/jupyterhub/z2jh.py from config (rw,path="z2jh.py")
/var/run/secrets/kubernetes.io/serviceaccount from hub-token-zd25x (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hub
Optional: false
secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub
Optional: false
pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: hub-db-dir
ReadOnly: false
hub-token-zd25x:
Type: Secret (a volume populated by a Secret)
SecretName: hub-token-zd25x
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: hub.jupyter.org/dedicated=core:NoSchedule
hub.jupyter.org_dedicated=core:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21m default-scheduler Successfully assigned default/hub-6c9cb48df8-k5t4w to worker
Normal Pulled 21m kubelet, worker Container image "jupyterhub/k8s-hub:1.2.0" already present on machine
Normal Created 21m kubelet, worker Created container hub
Normal Started 21m kubelet, worker Started container hub
Warning Unhealthy 21m (x3 over 21m) kubelet, worker Readiness probe failed: Get http://192.168.171.97:8081/hub/health: dial tcp 192.168.171.97:8081: connect: connection refused
So I know that the pod is unhealthy. But I do not have any other details to debug this. Any help on how to fix or debug this would be highly appreciated.
Thank you!
The Pod status is always pending. I'm using Kind locally to study Kubernetes and trying to go up one Pod.
❯ kubectl get pods
NAME READY STATUS RESTARTS AGE
goserver 0/1 Pending 0 12m
The events from describe is none.
❯ kubectl describe pod goserver
Name: goserver
Namespace: default
Priority: 0
Node: <none>
Labels: app=goserver
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
goserver:
Image: allansduarte/hellogo
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rjkmz (ro)
Volumes:
kube-api-access-rjkmz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
No logs.
❯ kubectl logs goserver
Kubernetes system logs.
❯ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-558bd4d5db-62bbj 1/1 Running 5 10d
coredns-558bd4d5db-zk9rw 1/1 Running 5 10d
etcd-fullcycle-control-plane 1/1 Running 0 95m
kindnet-66c9q 1/1 Running 5 10d
kindnet-6wfzg 1/1 Running 5 10d
kindnet-sklzj 1/1 Running 5 10d
kindnet-xjh4p 1/1 Running 5 10d
kube-apiserver-fullcycle-control-plane 1/1 Running 0 95m
kube-controller-manager-fullcycle-control-plane 1/1 Running 8 10d
kube-proxy-cdzrj 1/1 Running 5 10d
kube-proxy-jphsn 1/1 Running 5 10d
kube-proxy-mhdtt 1/1 Running 5 10d
kube-proxy-x8jbm 1/1 Running 5 10d
kube-scheduler-fullcycle-control-plane 1/1 Running 8 10d
Any suggestions to continue the investigation?
I switched to Mac OS Big Sur recently and Kubernetes and Docker did not start. Then, I just reinstall the Docker and Kubernetes. The options for reset Kubernetes, clean/purge data or reset to factory defaults don't work.
I'm doing some tests with minikube + calico plugin to see if I can set the pod IP on pod creation.
I've open the minikube proxy and sent:
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": "pod2",
"annotations": {
"cni.projectcalico.org/ipAddrs": "[\"172.18.0.50\"]"
}
},
"spec": {
"containers": [
{
"name": "hello-node",
"image": "k8s.gcr.io/echoserver:1.4",
"ports": [
{
"containerPort": 8081
}
]
}
]
}
}
But it seems the annotation was ignored. The pod was created using another IP:
NAME READY STATUS RESTARTS AGE IP NODE
pod1 1/1 Running 0 36s 172.18.0.8 minikube
pod2 1/1 Running 0 6s 172.18.0.9 minikube
I've checked the 10-calico.conflist file, the plugin is set to use calico-ipam.
What am I missing?
Edit:
Calico version:
Client Version: v3.14.0
Git commit: c97876ba
Cluster Version: v3.14.0
Cluster Type: k8s,kdd,bgp,kubeadm
Output of kubectl get po --all-namespaces -o wide:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default hello-minikube-64b64df8c9-fd5vz 1/1 Running 1 70m 172.18.0.7 minikube <none> <none>
default pod1 1/1 Running 0 67m 172.18.0.8 minikube <none> <none>
default pod2 1/1 Running 0 66m 172.18.0.9 minikube <none> <none>
kube-system calico-kube-controllers-789f6df884-msvzm 1/1 Running 2 152m 172.18.0.6 minikube <none> <none>
kube-system calico-node-5l2vm 1/1 Running 1 152m 172.17.0.2 minikube <none> <none>
kube-system calicoctl 1/1 Running 1 121m 172.17.0.2 minikube <none> <none>
kube-system coredns-66bff467f8-8hmpv 1/1 Running 3 28h 172.18.0.5 minikube <none> <none>
kube-system coredns-66bff467f8-xwrpj 1/1 Running 3 28h 172.18.0.3 minikube <none> <none>
kube-system etcd-minikube 1/1 Running 2 27h 172.17.0.2 minikube <none> <none>
kube-system kube-apiserver-minikube 1/1 Running 2 27h 172.17.0.2 minikube <none> <none>
kube-system kube-controller-manager-minikube 1/1 Running 3 28h 172.17.0.2 minikube <none> <none>
kube-system kube-proxy-wq29b 1/1 Running 3 28h 172.17.0.2 minikube <none> <none>
kube-system kube-scheduler-minikube 1/1 Running 3 28h 172.17.0.2 minikube <none> <none>
kube-system storage-provisioner 1/1 Running 5 28h 172.17.0.2 minikube <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-84bfdf55ff-kj4t2 1/1 Running 3 28h 172.18.0.4 minikube <none> <none>
kubernetes-dashboard kubernetes-dashboard-696dbcc666-qxc78 1/1 Running 5 28h 172.18.0.2 minikube <none> <none>
Created a local cluster using Vagrant + Ansible + VirtualBox. Manually deploying works fine, but when using Helm:
:~$helm install stable/nginx-ingress --name nginx-ingress-controller --set rbac.create=true
Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 10.0.52.15:10250: i/o timeout
Kubernetes cluster info:
:~$kubectl get nodes,po,deploy,svc,ingress --all-namespaces -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/ubuntu18-kube-master Ready master 32m v1.13.3 10.0.51.15 <none> Ubuntu 18.04.1 LTS 4.15.0-43-generic docker://18.6.1
node/ubuntu18-kube-node-1 Ready <none> 31m v1.13.3 10.0.52.15 <none> Ubuntu 18.04.1 LTS 4.15.0-43-generic docker://18.6.1
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default pod/nginx-server 1/1 Running 0 40s 10.244.1.5 ubuntu18-kube-node-1 <none> <none>
default pod/nginx-server-b8d78876d-cgbjt 1/1 Running 0 4m25s 10.244.1.4 ubuntu18-kube-node-1 <none> <none>
kube-system pod/coredns-86c58d9df4-5rsw2 1/1 Running 0 31m 10.244.0.2 ubuntu18-kube-master <none> <none>
kube-system pod/coredns-86c58d9df4-lfbvd 1/1 Running 0 31m 10.244.0.3 ubuntu18-kube-master <none> <none>
kube-system pod/etcd-ubuntu18-kube-master 1/1 Running 0 31m 10.0.51.15 ubuntu18-kube-master <none> <none>
kube-system pod/kube-apiserver-ubuntu18-kube-master 1/1 Running 0 30m 10.0.51.15 ubuntu18-kube-master <none> <none>
kube-system pod/kube-controller-manager-ubuntu18-kube-master 1/1 Running 0 30m 10.0.51.15 ubuntu18-kube-master <none> <none>
kube-system pod/kube-flannel-ds-amd64-jffqn 1/1 Running 0 31m 10.0.51.15 ubuntu18-kube-master <none> <none>
kube-system pod/kube-flannel-ds-amd64-vc6p2 1/1 Running 0 31m 10.0.52.15 ubuntu18-kube-node-1 <none> <none>
kube-system pod/kube-proxy-fbgmf 1/1 Running 0 31m 10.0.52.15 ubuntu18-kube-node-1 <none> <none>
kube-system pod/kube-proxy-jhs6b 1/1 Running 0 31m 10.0.51.15 ubuntu18-kube-master <none> <none>
kube-system pod/kube-scheduler-ubuntu18-kube-master 1/1 Running 0 31m 10.0.51.15 ubuntu18-kube-master <none> <none>
kube-system pod/tiller-deploy-69ffbf64bc-x8lkc 1/1 Running 0 24m 10.244.1.2 ubuntu18-kube-node-1 <none> <none>
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
default deployment.extensions/nginx-server 1/1 1 1 4m25s nginx-server nginx run=nginx-server
kube-system deployment.extensions/coredns 2/2 2 2 32m coredns k8s.gcr.io/coredns:1.2.6 k8s-app=kube-dns
kube-system deployment.extensions/tiller-deploy 1/1 1 1 24m tiller gcr.io/kubernetes-helm/tiller:v2.12.3 app=helm,name=tiller
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 32m <none>
default service/nginx-server NodePort 10.99.84.201 <none> 80:31811/TCP 12s run=nginx-server
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 32m k8s-app=kube-dns
kube-system service/tiller-deploy ClusterIP 10.99.4.74 <none> 44134/TCP 24m app=helm,name=tiller
Vagrantfile:
...
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
$hosts.each_with_index do |(hostname, parameters), index|
ip_address = "#{$subnet}.#{$ip_offset + index}"
config.vm.define vm_name = hostname do |vm_config|
vm_config.vm.hostname = hostname
vm_config.vm.box = box
vm_config.vm.network "private_network", ip: ip_address
vm_config.vm.provider :virtualbox do |vb|
vb.gui = false
vb.name = hostname
vb.memory = parameters[:memory]
vb.cpus = parameters[:cpus]
vb.customize ['modifyvm', :id, '--macaddress1', "08002700005#{index}"]
vb.customize ['modifyvm', :id, '--natnet1', "10.0.5#{index}.0/24"]
end
end
end
end
Workaround for VirtualBox issue: set diffenrent macaddress and internal_ip.
It is interesting to find a solution that can be placed in one of the configuration files: vagrant, ansible roles. Any ideas on the problem?
Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 10.0.52.15:10250: i/o timeout
You're getting bitten by a very common kubernetes-on-Vagrant bug: the kubelet believes its IP address is eth0, which is the NAT interface in Vagrant, versus using (what I hope you have) the :private_address network in your Vagrantfile. Thus, since all kubelet interactions happen directly to it (and not through the API server), things like kubectl exec and kubectl logs will fail in exactly the way you see.
The solution is to force kubelet to bind to the private network interface, or I guess you could switch your Vagrantfile to use the bridge network, if that's an option for you -- just so long as the interface isn't the NAT one.
The question is about how you manage TLS Certificates in the cluster, ensure that port 10250 is reachable.
Here is an example of how i fix it when i try to run exec a pod running in node (instance aws in my case),
resource "aws_security_group" "My_VPC_Security_Group" {
...
ingress {
description = "TLS from VPC"
from_port = 10250
to_port = 10250
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
For more details you can visit [1]: http://carnal0wnage.attackresearch.com/2019/01/kubernetes-unauth-kublet-api-10250.html