I am very confused about why my pods are staying in pending status.
Vitess seems have problem scheduling the vttablet pod on nodes. I built a 2-worker-node Kubernetes cluster (nodes A & B), and started vttablets on the cluster, but only two vttablets start normally, the other three is stay in pending state.
When I allow the master node to schedule pods, then the three pending vttablets all start on the master (first error, then running normally), and I create tables, two vttablet failed to execute.
When I add two new nodes (nodes C & D) to my kubernetes cluster, tear down vitess and restart vttablet, I find that the three vttablet pods still remain in pending
state, also if I kick off node A or node B, I get vttablet lost, and it will not restart on new node. I tear down vitess, and also tear down k8s cluster, rebuild it, and this time I use nodes C & D to build a 2-worker-node k8s cluster, and all vttablet now remain in pending status.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default etcd-global-5zh4k77slf 1/1 Running 0 46m 192.168.2.3 t-searchredis-a2 <none>
default etcd-global-f7db9nnfq9 1/1 Running 0 45m 192.168.2.5 t-searchredis-a2 <none>
default etcd-global-ksh5r9k45l 1/1 Running 0 45m 192.168.1.4 t-searchredis-a1 <none>
default etcd-operator-6f44498865-t84l5 1/1 Running 0 50m 192.168.2.2 t-searchredis-a2 <none>
default etcd-test-5g5lmcrl2x 1/1 Running 0 46m 192.168.2.4 t-searchredis-a2 <none>
default etcd-test-g4xrkk7wgg 1/1 Running 0 45m 192.168.1.5 t-searchredis-a1 <none>
default etcd-test-jkq4rjrwm8 1/1 Running 0 45m 192.168.2.6 t-searchredis-a2 <none>
default vtctld-z5d46 1/1 Running 0 44m 192.168.1.6 t-searchredis-a1 <none>
default vttablet-100 0/2 Pending 0 40m <none> <none> <none>
default vttablet-101 0/2 Pending 0 40m <none> <none> <none>
default vttablet-102 0/2 Pending 0 40m <none> <none> <none>
default vttablet-103 0/2 Pending 0 40m <none> <none> <none>
default vttablet-104 0/2 Pending 0 40m <none> <none> <none>
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2018-11-27T07:25:19Z
labels:
app: vitess
component: vttablet
keyspace: test_keyspace
shard: "0"
tablet: test-0000000100
name: vttablet-100
namespace: default
resourceVersion: "22304"
selfLink: /api/v1/namespaces/default/pods/vttablet-100
uid: 98258046-f215-11e8-b6a1-fa163e0411d1
spec:
containers:
- command:
- bash
- -c
- |-
set -e
mkdir -p $VTDATAROOT/tmp
chown -R vitess /vt
su -p -s /bin/bash -c "/vt/bin/vttablet -binlog_use_v3_resharding_mode -topo_implementation etcd2 -topo_global_server_address http://etcd-global-client:2379 -topo_global_root /global -log_dir $VTDATAROOT/tmp -alsologtostderr -port 15002 -grpc_port 16002 -service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' -tablet-path test-0000000100 -tablet_hostname $(hostname -i) -init_keyspace test_keyspace -init_shard 0 -init_tablet_type replica -health_check_interval 5s -mysqlctl_socket $VTDATAROOT/mysqlctl.sock -enable_semi_sync -enable_replication_reporter -orc_api_url http://orchestrator/api -orc_discover_interval 5m -restore_from_backup -backup_storage_implementation file -file_backup_storage_root '/usr/local/MySQL_DB_Backup/test'" vitess
env:
- name: EXTRA_MY_CNF
value: /vt/config/mycnf/master_mysql56.cnf
image: vitess/lite
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /debug/vars
port: 15002
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: vttablet
ports:
- containerPort: 15002
name: web
protocol: TCP
- containerPort: 16002
name: grpc
protocol: TCP
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/log
name: syslog
- mountPath: /vt/vtdataroot
name: vtdataroot
- mountPath: /etc/ssl/certs/ca-certificates.crt
name: certs
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-7g2jb
readOnly: true
- command:
- sh
- -c
- |-
mkdir -p $VTDATAROOT/tmp && chown -R vitess /vt
su -p -c "/vt/bin/mysqlctld -log_dir $VTDATAROOT/tmp -alsologtostderr -tablet_uid 100 -socket_file $VTDATAROOT/mysqlctl.sock -init_db_sql_file $VTROOT/config/init_db.sql" vitess
env:
- name: EXTRA_MY_CNF
value: /vt/config/mycnf/master_mysql56.cnf
image: vitess/lite
imagePullPolicy: Always
name: mysql
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/log
name: syslog
- mountPath: /vt/vtdataroot
name: vtdataroot
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-7g2jb
readOnly: true
dnsPolicy: ClusterFirst
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- hostPath:
path: /dev/log
type: ""
name: syslog
- emptyDir: {}
name: vtdataroot
- hostPath:
path: /etc/ssl/certs/ca-certificates.crt
type: ""
name: certs
- name: default-token-7g2jb
secret:
defaultMode: 420
secretName: default-token-7g2jb
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-27T07:25:19Z
message: '0/3 nodes are available: 1 node(s) had taints that the pod didn''t tolerate,
2 Insufficient cpu.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Guaranteed
As you can see down at the bottom:
message: '0/3 nodes are available: 1 node(s) had taints that the pod didn''t tolerate,
2 Insufficient cpu.'
Meaning that your two worker nodes are out of resources based on the limits you specified in the pod. You will need more workers, or smaller CPU requests.
Related
I have a Kong deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-test-kong
labels:
app: local-test-kong
spec:
replicas: 1
selector:
matchLabels:
app: local-test-kong
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: local-test-kong
spec:
automountServiceAccountToken: false
containers:
- envFrom:
- configMapRef:
name: kong-env-vars
image: kong:2.6
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- /bin/sleep 15 && kong quit
livenessProbe:
failureThreshold: 3
httpGet:
path: /status
port: status
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: proxy
ports:
- containerPort: 8000
name: proxy
protocol: TCP
- containerPort: 8100
name: status
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: status
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources: # ToDo
limits:
cpu: 256m
memory: 256Mi
requests:
cpu: 256m
memory: 256Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /kong_prefix/
name: kong-prefix-dir
- mountPath: /tmp
name: tmp-dir
- mountPath: /kong_dbless/
name: kong-custom-dbless-config-volume
terminationGracePeriodSeconds: 30
volumes:
- name: kong-prefix-dir
- name: tmp-dir
- configMap:
defaultMode: 0555
name: kong-declarative
name: kong-custom-dbless-config-volume
I applied this YAML in GKE. Then i ran kubectl describe on its pod.
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
local-test-kong-678598ffc6-ll9s8 1/1 Running 0 25m
➜ kubectl describe pod/local-test-kong-678598ffc6-ll9s8
Name: local-test-kong-678598ffc6-ll9s8
Namespace: local-test-kong
Priority: 0
Node: gke-paas-cluster-prd-tf9-default-pool-e7cb502a-ggxl/10.128.64.95
Start Time: Wed, 23 Nov 2022 00:12:56 +0800
Labels: app=local-test-kong
pod-template-hash=678598ffc6
Annotations: kubectl.kubernetes.io/restartedAt: 2022-11-23T00:12:56+08:00
Status: Running
IP: 10.128.96.104
IPs:
IP: 10.128.96.104
Controlled By: ReplicaSet/local-test-kong-678598ffc6
Containers:
proxy:
Container ID: containerd://1bd392488cfe33dcc62f717b3b8831349e8cf573326add846c9c843c7bf15e2a
Image: kong:2.6
Image ID: docker.io/library/kong#sha256:62eb6d17133b007cbf5831b39197c669b8700c55283270395b876d1ecfd69a70
Ports: 8000/TCP, 8100/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 23 Nov 2022 00:12:58 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 256m
memory: 256Mi
Requests:
cpu: 256m
memory: 256Mi
Liveness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Environment Variables from:
kong-env-vars ConfigMap Optional: false
Environment: <none>
Mounts:
/kong_dbless/ from kong-custom-dbless-config-volume (rw)
/kong_prefix/ from kong-prefix-dir (rw)
/tmp from tmp-dir (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kong-prefix-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kong-custom-dbless-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kong-declarative
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned local-test-kong/local-test-kong-678598ffc6-ll9s8 to gke-paas-cluster-prd-tf9-default-pool-e7cb502a-ggxl
Normal Pulled 25m kubelet Container image "kong:2.6" already present on machine
Normal Created 25m kubelet Created container proxy
Normal Started 25m kubelet Started container proxy
➜
I applied the same YAML in my localhost's MicroK8S (on MacOS) and then I ran kubectl describe on its pod.
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
local-test-kong-54cfc585cb-7grj8 1/1 Running 0 86s
➜ kubectl describe pod/local-test-kong-54cfc585cb-7grj8
Name: local-test-kong-54cfc585cb-7grj8
Namespace: local-test-kong
Priority: 0
Node: microk8s-vm/192.168.64.5
Start Time: Wed, 23 Nov 2022 00:39:33 +0800
Labels: app=local-test-kong
pod-template-hash=54cfc585cb
Annotations: cni.projectcalico.org/podIP: 10.1.254.79/32
cni.projectcalico.org/podIPs: 10.1.254.79/32
kubectl.kubernetes.io/restartedAt: 2022-11-23T00:39:33+08:00
Status: Running
IP: 10.1.254.79
IPs:
IP: 10.1.254.79
Controlled By: ReplicaSet/local-test-kong-54cfc585cb
Containers:
proxy:
Container ID: containerd://d60d09ca8b77ee59c80ea060dcb651c3e346c3a5f0147b0d061790c52193d93d
Image: kong:2.6
Image ID: docker.io/library/kong#sha256:62eb6d17133b007cbf5831b39197c669b8700c55283270395b876d1ecfd69a70
Ports: 8000/TCP, 8100/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 23 Nov 2022 00:39:37 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 256m
memory: 256Mi
Requests:
cpu: 256m
memory: 256Mi
Liveness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Environment Variables from:
kong-env-vars ConfigMap Optional: false
Environment: <none>
Mounts:
/kong_dbless/ from kong-custom-dbless-config-volume (rw)
/kong_prefix/ from kong-prefix-dir (rw)
/tmp from tmp-dir (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kong-prefix-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kong-custom-dbless-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kong-declarative
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 92s default-scheduler Successfully assigned local-test-kong/local-test-kong-54cfc585cb-7grj8 to microk8s-vm
Normal Pulled 90s kubelet Container image "kong:2.6" already present on machine
Normal Created 90s kubelet Created container proxy
Normal Started 89s kubelet Started container proxy
Warning Unhealthy 68s kubelet Readiness probe failed: Get "http://10.1.254.79:8100/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 68s kubelet Liveness probe failed: Get "http://10.1.254.79:8100/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
➜
It's the exact same deployment YAML. However, the deployment created inside GKE cluster are running all fine with no complaints. But, the deployment created inside my localhost microk8s (on MacOS) is showing probe failures.
What could i be missing here while deploying to microk8s (on MacOS)?
Your readiness probes are failing on the local pod on port 8100. It looks like you have a firewall(s) rule preventing internal pod and/or pod to pod communication.
As per the docs:
You may need to configure your firewall to allow pod-to-pod and pod-to-internet communication:
sudo ufw allow in on cni0 && sudo ufw allow out on cni0
sudo ufw default allow routed
I'm trying to deploy a Mongodb ReplicaSet on microk8s cluster. I have installed a VM running on Ubuntu 20.04. After the deployment, the mongo pods do not run but crash. I've enabled microk8s storage, dns and rbac add-ons but still the same problem persists. Can any one help me find the reason behind it? Below is my manifest file:
apiVersion: v1
kind: Service
metadata:
name: mongodb-service
labels:
name: mongo
spec:
ports:
- port: 27017
targetPort: 27017
clusterIP: None
selector:
role: mongo
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongo
spec:
selector:
matchLabels:
role: mongo
environment: test
serviceName: mongodb-service
replicas: 3
template:
metadata:
labels:
role: mongo
environment: test
replicaset: MainRepSet
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: replicaset
operator: In
values:
- MainRepSet
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 10
volumes:
- name: secrets-volume
secret:
secretName: shared-bootstrap-data
defaultMode: 256
containers:
- name: mongod-container
#image: pkdone/mongo-ent:3.4
image: mongo
command:
- "numactl"
- "--interleave=all"
- "mongod"
- "--wiredTigerCacheSizeGB"
- "0.1"
- "--bind_ip"
- "0.0.0.0"
- "--replSet"
- "MainRepSet"
- "--auth"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
resources:
requests:
cpu: 0.2
memory: 200Mi
ports:
- containerPort: 27017
volumeMounts:
- name: secrets-volume
readOnly: true
mountPath: /etc/secrets-volume
- name: mongodb-persistent-storage-claim
mountPath: /data/db
volumeClaimTemplates:
- metadata:
name: mongodb-persistent-storage-claim
spec:
storageClassName: microk8s-hostpath
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
Also, here are the pv, pvc and sc outputs:
yyy#xxx:$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongodb-persistent-storage-claim-mongo-0 Bound pvc-1b3de8f7-e416-4a1a-9c44-44a0422e0413 5Gi RWO microk8s-hostpath 13m
yyy#xxx:$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-5b75ddf6-abbd-4ff3-a135-0312df1e6703 20Gi RWX Delete Bound container-registry/registry-claim microk8s-hostpath 38m
pvc-1b3de8f7-e416-4a1a-9c44-44a0422e0413 5Gi RWO Delete Bound default/mongodb-persistent-storage-claim-mongo-0 microk8s-hostpath 13m
yyy#xxx:$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
microk8s-hostpath (default) microk8s.io/hostpath Delete Immediate false 108m
yyy#xxx:$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
metrics-server-8bbfb4bdb-xvwcw 1/1 Running 1 148m
dashboard-metrics-scraper-78d7698477-4qdhj 1/1 Running 0 146m
kubernetes-dashboard-85fd7f45cb-6t7xr 1/1 Running 0 146m
hostpath-provisioner-5c65fbdb4f-ff7cl 1/1 Running 0 113m
coredns-7f9c69c78c-dr5kt 1/1 Running 0 65m
calico-kube-controllers-f7868dd95-wtf8j 1/1 Running 0 150m
calico-node-knzc2 1/1 Running 0 150m
I have installed the cluster using this command:
sudo snap install microk8s --classic --channel=1.21
Output of mongodb deployment:
yyy#xxx:$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/mongo-0 0/1 CrashLoopBackOff 5 4m18s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 109m
service/mongodb-service ClusterIP None <none> 27017/TCP 4m19s
NAME READY AGE
statefulset.apps/mongo 0/3 4m19s
Pod logs:
yyy#xxx:$ kubectl logs pod/mongo-0
{"t":{"$date":"2021-09-07T16:21:13.191Z"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"-","msg":"Error during global initialization","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"storage.wiredTiger.engineConfig.cacheSizeGB must be greater than or equal to 0.25"}}}
yyy#xxx:$ kubectl describe pod/mongo-0
Name: mongo-0
Namespace: default
Priority: 0
Node: citest1/192.168.9.105
Start Time: Tue, 07 Sep 2021 16:17:38 +0000
Labels: controller-revision-hash=mongo-66bd776569
environment=test
replicaset=MainRepSet
role=mongo
statefulset.kubernetes.io/pod-name=mongo-0
Annotations: cni.projectcalico.org/podIP: 10.1.150.136/32
cni.projectcalico.org/podIPs: 10.1.150.136/32
Status: Running
IP: 10.1.150.136
IPs:
IP: 10.1.150.136
Controlled By: StatefulSet/mongo
Containers:
mongod-container:
Container ID: containerd://458e21fac3e87dcf304a9701da0eb827b2646efe94cabce7f283cd49f740c15d
Image: mongo
Image ID: docker.io/library/mongo#sha256:58ea1bc09f269a9b85b7e1fae83b7505952aaa521afaaca4131f558955743842
Port: 27017/TCP
Host Port: 0/TCP
Command:
numactl
--interleave=all
mongod
--wiredTigerCacheSizeGB
0.1
--bind_ip
0.0.0.0
--replSet
MainRepSet
--auth
--clusterAuthMode
keyFile
--keyFile
/etc/secrets-volume/internal-auth-mongodb-keyfile
--setParameter
authenticationMechanisms=SCRAM-SHA-1
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 07 Sep 2021 16:24:03 +0000
Finished: Tue, 07 Sep 2021 16:24:03 +0000
Ready: False
Restart Count: 6
Requests:
cpu: 200m
memory: 200Mi
Environment: <none>
Mounts:
/data/db from mongodb-persistent-storage-claim (rw)
/etc/secrets-volume from secrets-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b7nf8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
mongodb-persistent-storage-claim:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongodb-persistent-storage-claim-mongo-0
ReadOnly: false
secrets-volume:
Type: Secret (a volume populated by a Secret)
SecretName: shared-bootstrap-data
Optional: false
kube-api-access-b7nf8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m53s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 7m52s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 7m50s default-scheduler Successfully assigned default/mongo-0 to citest1
Normal Pulled 7m25s kubelet Successfully pulled image "mongo" in 25.215669443s
Normal Pulled 7m21s kubelet Successfully pulled image "mongo" in 1.192994197s
Normal Pulled 7m6s kubelet Successfully pulled image "mongo" in 1.203239709s
Normal Pulled 6m38s kubelet Successfully pulled image "mongo" in 1.213451175s
Normal Created 6m38s (x4 over 7m23s) kubelet Created container mongod-container
Normal Started 6m37s (x4 over 7m23s) kubelet Started container mongod-container
Normal Pulling 5m47s (x5 over 7m50s) kubelet Pulling image "mongo"
Warning BackOff 2m49s (x23 over 7m20s) kubelet Back-off restarting failed container
The logs you provided show that you have an incorrectly set parameter wiredTigerCacheSizeGB. In your case it is 0.1, and according to the message
"code":2,"codeName":"BadValue","errmsg":"storage.wiredTiger.engineConfig.cacheSizeGB must be greater than or equal to 0.25"
it should be at least 0.25.
In the section containers:
containers:
- name: mongod-container
#image: pkdone/mongo-ent:3.4
image: mongo
command:
- "numactl"
- "--interleave=all"
- "mongod"
- "--wiredTigerCacheSizeGB"
- "0.1"
- "--bind_ip"
- "0.0.0.0"
- "--replSet"
- "MainRepSet"
- "--auth"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
you should change in this place
- "--wiredTigerCacheSizeGB"
- "0.1"
the value "0.1" to any other greather or equal "0.25".
Additionally I have seen another error:
1 pod has unbound immediate PersistentVolumeClaims
It should related to what I wrote earlier. However, you may find alternative ways to solve it here, here and here.
I am new to k8s and trying to setup prometheus monitoring for k8s. I used
"helm install" to setup prometheus. Now:
two pods are still in pending state:
prometheus-server
prometheus-alertmanager
I manually created persistent volume for both
Can anyone help me with how to map these PV with PVC created by helm chart?
[centos#k8smaster1 ~]$ kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
prometheus-alertmanager-7757d759b8-x6bd7 0/2 Pending 0 44m
prometheus-kube-state-metrics-7f85b5d86c-cq9kr 1/1 Running 0 44m
prometheus-node-exporter-5rz2k 1/1 Running 0 44m
prometheus-pushgateway-5b8465d455-672d2 1/1 Running 0 44m
prometheus-server-7f8b5fc64b-w626v 0/2 Pending 0 44m
[centos#k8smaster1 ~]$ kubectl get pv
prometheus-alertmanager 3Gi RWX Retain Available 22m
prometheus-server 12Gi RWX Retain Available 30m
[centos#k8smaster1 ~]$ kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-alertmanager Pending 20m
prometheus-server Pending 20m
[centos#k8smaster1 ~]$ kubectl describe pvc prometheus-alertmanager -n monitoring
Name: prometheus-alertmanager
Namespace: monitoring
StorageClass:
Status: Pending
Volume:
Labels: app=prometheus
chart=prometheus-8.15.0
component=alertmanager
heritage=Tiller
release=prometheus
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 116s (x83 over 22m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
Mounted By: prometheus-alertmanager-7757d759b8-x6bd7
I am expecting the pods to get into running state
!!!UPDATE!!!
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-alertmanager Pending local-storage 4m29s
prometheus-server Pending local-storage 4m29s
[centos#k8smaster1 prometheus_pv_storage]$ kubectl describe pvc prometheus-server -n monitoring
Name: prometheus-server
Namespace: monitoring
StorageClass: local-storage
Status: Pending
Volume:
Labels: app=prometheus
chart=prometheus-8.15.0
component=server
heritage=Tiller
release=prometheus
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 11s (x22 over 4m59s) persistentvolume-controller waiting for first consumer to be created before binding
Mounted By: prometheus-server-7f8b5fc64b-bqf42
!!UPDATE-2!!
[centos#k8smaster1 ~]$ kubectl get pods prometheus-server-7f8b5fc64b-bqf42 -n monitoring -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-08-18T16:10:54Z"
generateName: prometheus-server-7f8b5fc64b-
labels:
app: prometheus
chart: prometheus-8.15.0
component: server
heritage: Tiller
pod-template-hash: 7f8b5fc64b
release: prometheus
name: prometheus-server-7f8b5fc64b-bqf42
namespace: monitoring
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: prometheus-server-7f8b5fc64b
uid: c1979bcb-c1d2-11e9-819d-fa163ebb8452
resourceVersion: "2461054"
selfLink: /api/v1/namespaces/monitoring/pods/prometheus-server-7f8b5fc64b-bqf42
uid: c19890d1-c1d2-11e9-819d-fa163ebb8452
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.2.2
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-server-token-7h2df
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: prom/prometheus:v2.11.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-server-token-7h2df
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
- name: prometheus-server-token-7h2df
secret:
defaultMode: 420
secretName: prometheus-server-token-7h2df
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-08-18T16:10:54Z"
message: '0/2 nodes are available: 1 node(s) didn''t find available persistent
volumes to bind, 1 node(s) had taints that the pod didn''t tolerate.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: BestEffort
Also I have the volumes created and assigned to local storage
[centos#k8smaster1 prometheus_pv]$ kubectl get pv -n monitoring
prometheus-alertmanager 3Gi RWX Retain Available local-storage 2d19h
prometheus-server 12Gi RWX Retain Available local-storage 2d19h
If you are in EKS, your node need to have the next permission
arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy
and the Amazon EBS CSI Driver Add-on
Prometheus will try to create PersiatentVolumeClaims with accessModes as ReadWriteOnce, PVC will get matched to PersistentVolume only if accessmodes are same. Change your accessmode of PV to ReadWriteOnce, it should work.
I'm running a microk8s cluster on an Ubuntu server at home, and I have it connected to a local NAS server for persistent storage. I've been using it as my personal proving grounds for learning Kubernetes, but I seem to encounter problem after problem at just about every step of the way.
I've got the NFS Client Provisioner Helm chart installed which I've confirmed works - it will dynamically provision PVCs on my NAS server. I later was able to successfully install the Postgres Helm chart, or so I thought. After creating it I was able to connect to it using a SQL client, and I was feeling good.
Until a couple of days later, I noticed the pod was showing 0/1 containers ready. Although interestingly, the nfs-client-provisioner pod was still showing 1/1. Long story short: I've deleted/purged the Postgres Helm chart, and attempted to reinstall it, but now it no longer works. In fact, nothing new that I try to deploy works. Everything looks as though it's going to work, but then just hangs on either Init or ContainerCreating forever.
With Postgres in particular, the command I've been running is this:
helm install --name postgres stable/postgresql -f postgres.yaml
And my postgres.yaml file looks like this:
persistence:
storageClass: nfs-client
accessMode: ReadWriteMany
size: 2Gi
But if I do a kubectl get pods I see still see this:
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner 1/1 Running 1 11d
postgres-postgresql-0 0/1 Init:0/1 0 3h51m
If I do a kubectl describe pod postgres-postgresql-0, this is the output:
Name: postgres-postgresql-0
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: stjohn/192.168.1.217
Start Time: Thu, 28 Mar 2019 12:51:02 -0500
Labels: app=postgresql
chart=postgresql-3.11.7
controller-revision-hash=postgres-postgresql-5bfb9cc56d
heritage=Tiller
release=postgres
role=master
statefulset.kubernetes.io/pod-name=postgres-postgresql-0
Annotations: <none>
Status: Pending
IP:
Controlled By: StatefulSet/postgres-postgresql
Init Containers:
init-chmod-data:
Container ID:
Image: docker.io/bitnami/minideb:latest
Image ID:
Port: <none>
Host Port: <none>
Command:
sh
-c
chown -R 1001:1001 /bitnami
if [ -d /bitnami/postgresql/data ]; then
chmod 0700 /bitnami/postgresql/data;
fi
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 250m
memory: 256Mi
Environment: <none>
Mounts:
/bitnami/postgresql from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-h4gph (ro)
Containers:
postgres-postgresql:
Container ID:
Image: docker.io/bitnami/postgresql:10.7.0
Image ID:
Port: 5432/TCP
Host Port: 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 250m
memory: 256Mi
Liveness: exec [sh -c exec pg_isready -U "postgres" -h localhost] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [sh -c exec pg_isready -U "postgres" -h localhost] delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
PGDATA: /bitnami/postgresql
POSTGRES_USER: postgres
POSTGRES_PASSWORD: <set to the key 'postgresql-password' in secret 'postgres-postgresql'> Optional: false
Mounts:
/bitnami/postgresql from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-h4gph (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-postgres-postgresql-0
ReadOnly: false
default-token-h4gph:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-h4gph
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
And if I do a kubectl get pod postgres-postgresql-0 -o yaml, this is the output:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-03-28T17:51:02Z"
generateName: postgres-postgresql-
labels:
app: postgresql
chart: postgresql-3.11.7
controller-revision-hash: postgres-postgresql-5bfb9cc56d
heritage: Tiller
release: postgres
role: master
statefulset.kubernetes.io/pod-name: postgres-postgresql-0
name: postgres-postgresql-0
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: postgres-postgresql
uid: 0d3ef673-5182-11e9-bf14-b8975a0ca30c
resourceVersion: "1953329"
selfLink: /api/v1/namespaces/default/pods/postgres-postgresql-0
uid: 0d4dfb56-5182-11e9-bf14-b8975a0ca30c
spec:
containers:
- env:
- name: PGDATA
value: /bitnami/postgresql
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
key: postgresql-password
name: postgres-postgresql
image: docker.io/bitnami/postgresql:10.7.0
imagePullPolicy: Always
livenessProbe:
exec:
command:
- sh
- -c
- exec pg_isready -U "postgres" -h localhost
failureThreshold: 6
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: postgres-postgresql
ports:
- containerPort: 5432
name: postgresql
protocol: TCP
readinessProbe:
exec:
command:
- sh
- -c
- exec pg_isready -U "postgres" -h localhost
failureThreshold: 6
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
requests:
cpu: 250m
memory: 256Mi
securityContext:
procMount: Default
runAsUser: 1001
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /bitnami/postgresql
name: data
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-h4gph
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: postgres-postgresql-0
initContainers:
- command:
- sh
- -c
- |
chown -R 1001:1001 /bitnami
if [ -d /bitnami/postgresql/data ]; then
chmod 0700 /bitnami/postgresql/data;
fi
image: docker.io/bitnami/minideb:latest
imagePullPolicy: Always
name: init-chmod-data
resources:
requests:
cpu: 250m
memory: 256Mi
securityContext:
procMount: Default
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /bitnami/postgresql
name: data
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-h4gph
readOnly: true
nodeName: stjohn
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
serviceAccount: default
serviceAccountName: default
subdomain: postgres-postgresql-headless
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: data
persistentVolumeClaim:
claimName: data-postgres-postgresql-0
- name: default-token-h4gph
secret:
defaultMode: 420
secretName: default-token-h4gph
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-03-28T17:51:02Z"
message: 'containers with incomplete status: [init-chmod-data]'
reason: ContainersNotInitialized
status: "False"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-03-28T17:51:02Z"
message: 'containers with unready status: [postgres-postgresql]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2019-03-28T17:51:02Z"
message: 'containers with unready status: [postgres-postgresql]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2019-03-28T17:51:02Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: docker.io/bitnami/postgresql:10.7.0
imageID: ""
lastState: {}
name: postgres-postgresql
ready: false
restartCount: 0
state:
waiting:
reason: PodInitializing
hostIP: 192.168.1.217
initContainerStatuses:
- image: docker.io/bitnami/minideb:latest
imageID: ""
lastState: {}
name: init-chmod-data
ready: false
restartCount: 0
state:
waiting:
reason: PodInitializing
phase: Pending
qosClass: Burstable
startTime: "2019-03-28T17:51:02Z"
I don't see anything obvious in these to be able to pinpoint what might be going on. And I've already rebooted the server just to see if that might help. Any thoughts? Why won't my containers start?
It looks like your initContainer is stuck in the PodInitializing state. The most likely scenario is that your PVCs are not ready. I recommend you describe your data-postgres-postgresql-0 PVC to make sure that the volume has actually been provisioned and is in the READY state. Your NFS provisioner may be working, but that specific PV/PVC may not have been created due to an error. I have run into similar phenomena with the EFS provisioner with AWS.
You can use the event command of kubectl. This will give you the event for your pod.
To filter for a specific pod you can use a field-selector:
kubectl get event --namespace abc-namespace --field-selector involvedObject.name=my-pod-zl6m6
I have Kubernetes installed on Container Linux by CoreOS alpha (1353.1.0)
using hyperkube v1.5.5_coreos.0 using my fork of coreos-kubernetes install scripts at https://github.com/kfirufk/coreos-kubernetes.
I have two ContainerOS machines.
coreos-2.tux-in.com resolved as 192.168.1.2 as controller
coreos-3.tux-in.com resolved as 192.168.1.3 as worker
kubectl get pods --all-namespaces returns
NAMESPACE NAME READY STATUS RESTARTS AGE
ceph ceph-mds-2743106415-rkww4 0/1 Pending 0 1d
ceph ceph-mon-check-3856521781-bd6k5 1/1 Running 0 1d
kube-lego kube-lego-3323932148-g2tf4 1/1 Running 0 1d
kube-system calico-node-xq6j7 2/2 Running 0 1d
kube-system calico-node-xzpp2 2/2 Running 4560 1d
kube-system calico-policy-controller-610849172-b7xjr 1/1 Running 0 1d
kube-system heapster-v1.3.0-beta.0-2754576759-v1f50 2/2 Running 0 1d
kube-system kube-apiserver-192.168.1.2 1/1 Running 0 1d
kube-system kube-controller-manager-192.168.1.2 1/1 Running 1 1d
kube-system kube-dns-3675956729-r7hhf 3/4 Running 3924 1d
kube-system kube-dns-autoscaler-505723555-l2pph 1/1 Running 0 1d
kube-system kube-proxy-192.168.1.2 1/1 Running 0 1d
kube-system kube-proxy-192.168.1.3 1/1 Running 0 1d
kube-system kube-scheduler-192.168.1.2 1/1 Running 1 1d
kube-system kubernetes-dashboard-3697905830-vdz23 1/1 Running 1246 1d
kube-system monitoring-grafana-4013973156-m2r2v 1/1 Running 0 1d
kube-system monitoring-influxdb-651061958-2mdtf 1/1 Running 0 1d
nginx-ingress default-http-backend-150165654-s4z04 1/1 Running 2 1d
so I can see that kube-dns-782804071-h78rf keeps restarting.
kubectl describe pod kube-dns-3675956729-r7hhf --namespace=kube-system returns:
Name: kube-dns-3675956729-r7hhf
Namespace: kube-system
Node: 192.168.1.2/192.168.1.2
Start Time: Sat, 11 Mar 2017 17:54:14 +0000
Labels: k8s-app=kube-dns
pod-template-hash=3675956729
Status: Running
IP: 10.2.67.243
Controllers: ReplicaSet/kube-dns-3675956729
Containers:
kubedns:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:kubedns
Image: gcr.io/google_containers/kubedns-amd64:1.9
Image ID: rkt://sha512-c7b7c9c4393bea5f9dc5bcbe1acf1036c2aca36ac14b5e17fd3c675a396c4219
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-map=kube-dns
--v=2
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: False
Restart Count: 981
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables:
PROMETHEUS_PORT: 10055
dnsmasq:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
Image ID: rkt://sha512-8c5f8b40f6813bb676ce04cd545c55add0dc8af5a3be642320244b74ea03f872
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
Requests:
cpu: 150m
memory: 10Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
dnsmasq-metrics:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq-metrics
Image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
Image ID: rkt://sha512-ceb3b6af1cd67389358be14af36b5e8fb6925e78ca137b28b93e0d8af134585b
Port: 10054/TCP
Args:
--v=2
--logtostderr
Requests:
memory: 10Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
healthz:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:healthz
Image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
Image ID: rkt://sha512-3a85b0533dfba81b5083a93c7e091377123dac0942f46883a4c10c25cf0ad177
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-zqbdp:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zqbdp
QoS Class: Burstable
Tolerations: CriticalAddonsOnly=:Exists
No events.
which shows that kubedns-amd64:1.9 is in Ready: false
this is my kude-dns-de.yaml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
spec:
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
containers:
- name: kubedns
image: gcr.io/google_containers/kubedns-amd64:1.9
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthz-kubedns
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 5
args:
- --domain=cluster.local.
- --dns-port=10053
- --config-map=kube-dns
# This should be set to v=2 only after the new image (cut from 1.5) has
# been released, otherwise we will flood the logs.
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
- name: dnsmasq
image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
livenessProbe:
httpGet:
path: /healthz-dnsmasq
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --cache-size=1000
- --no-resolv
- --server=127.0.0.1#10053
- --log-facility=-
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
# see: https://github.com/kubernetes/kubernetes/issues/29055 for details
resources:
requests:
cpu: 150m
memory: 10Mi
- name: dnsmasq-metrics
image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
livenessProbe:
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --v=2
- --logtostderr
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
memory: 10Mi
- name: healthz
image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
resources:
limits:
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
args:
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- --url=/healthz-dnsmasq
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
- --url=/healthz-kubedns
- --port=8080
- --quiet
ports:
- containerPort: 8080
protocol: TCP
dnsPolicy: Default
and this is my kube-dns-svc.yaml:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.3.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
any information regarding the issue would be greatly appreciated!
update
rkt list --full 2> /dev/null | grep kubedns shows:
744a4579-0849-4fae-b1f5-cb05d40f3734 kubedns gcr.io/google_containers/kubedns-amd64:1.9 sha512-c7b7c9c4393b running 2017-03-22 22:14:55.801 +0000 UTC 2017-03-22 22:14:56.814 +0000 UTC
journalctl -m _MACHINE_ID=744a45790849b1f5cb05d40f3734 provides:
Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254 8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
I tried to add - --proxy-mode=userspace to /etc/kubernetes/manifests/kube-proxy.yaml but the results are the same.
kubectl get svc --all-namespaces provides:
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ceph ceph-mon None <none> 6789/TCP 1h
default kubernetes 10.3.0.1 <none> 443/TCP 1h
kube-system heapster 10.3.0.2 <none> 80/TCP 1h
kube-system kube-dns 10.3.0.10 <none> 53/UDP,53/TCP 1h
kube-system kubernetes-dashboard 10.3.0.116 <none> 80/TCP 1h
kube-system monitoring-grafana 10.3.0.187 <none> 80/TCP 1h
kube-system monitoring-influxdb 10.3.0.214 <none> 8086/TCP 1h
nginx-ingress default-http-backend 10.3.0.233 <none> 80/TCP 1h
kubectl get cs provides:
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
my kube-proxy.yaml has the following content:
apiVersion: v1
kind: Pod
metadata:
name: kube-proxy
namespace: kube-system
annotations:
rkt.alpha.kubernetes.io/stage1-name-override: coreos.com/rkt/stage1-fly
spec:
hostNetwork: true
containers:
- name: kube-proxy
image: quay.io/coreos/hyperkube:v1.5.5_coreos.0
command:
- /hyperkube
- proxy
- --cluster-cidr=10.2.0.0/16
- --kubeconfig=/etc/kubernetes/controller-kubeconfig.yaml
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/ssl/certs
name: "ssl-certs"
- mountPath: /etc/kubernetes/controller-kubeconfig.yaml
name: "kubeconfig"
readOnly: true
- mountPath: /etc/kubernetes/ssl
name: "etc-kube-ssl"
readOnly: true
- mountPath: /var/run/dbus
name: dbus
readOnly: false
volumes:
- hostPath:
path: "/usr/share/ca-certificates"
name: "ssl-certs"
- hostPath:
path: "/etc/kubernetes/controller-kubeconfig.yaml"
name: "kubeconfig"
- hostPath:
path: "/etc/kubernetes/ssl"
name: "etc-kube-ssl"
- hostPath:
path: /var/run/dbus
name: dbus
this is all the valuable information I could find. any ideas? :)
update 2
output of iptables-save on the controller ContainerOS at http://pastebin.com/2GApCj0n
update 3
I ran curl on the controller node
# curl https://10.3.0.1 --insecure
Unauthorized
means it can access it properly, i didn't add enough parameters for it to be authorized right ?
update 4
thanks to #jaxxstorm I removed calico manifests, updated their quay/cni and quay/node versions and reinstalled them.
now kubedns keeps restarting, but I think that now calico works. because for the first time it tries to install kubedns on the worker node and not on the controller node, and also when I rkt enter the kubedns pod and try to wget https://10.3.0.1 I get:
# wget https://10.3.0.1
Connecting to 10.3.0.1 (10.3.0.1:443)
wget: can't execute 'ssl_helper': No such file or directory
wget: error getting response: Connection reset by peer
which clearly shows that there is some kind of response. which is good right ?
now kubectl get pods --all-namespaces shows:
kube-system kube-dns-3675956729-ljz2w 4/4 Running 88 42m
so.. 4/4 ready but it keeps restarting.
kubectl describe pod kube-dns-3675956729-ljz2w --namespace=kube-system output at http://pastebin.com/Z70U331G
so it can't connect to http://10.2.47.19:8081/readiness, i'm guessing this is the ip of kubedns since it uses port 8081. don't know how to continue investigating this issue further.
thanks for everything!
Lots of great debugging info here, thanks!
This is the clincher:
# curl https://10.3.0.1 --insecure
Unauthorized
You got an unauthorized response because you didn't pass a client cert, but that's fine, it's not what we're after. This proves that kube-proxy is working as expected and is accessible. Your rkt logs:
Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254 8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
Are indicating that the containers that there's network connectivity issues inside the containers, which indicates to me that you haven't configured container networking/CNI.
Please have a read through this document: https://coreos.com/rkt/docs/latest/networking/overview.html
You may also have to reconfigure calico, there's some more information that here: http://docs.projectcalico.org/master/getting-started/rkt/
kube-dns has a readiness probe that tries resolving trough the Service IP of kube-dns. Is it possible that there is a problem with your Service network?
Check out the answer and solution here:
kubernetes service IPs not reachable