Kube-dns always in pending state - kubernetes

I have deployed kubernetes on a virt-manager vm following this link
https://kubernetes.io/docs/setup/independent/install-kubeadm/
When i join my another vm to the cluster i find that the kube-dns is in pending state.
root#ubuntu1:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-ubuntu1 1/1 Running 0 7m
kube-system kube-apiserver-ubuntu1 1/1 Running 0 8m
kube-system kube-controller-manager-ubuntu1 1/1 Running 0 8m
kube-system kube-dns-86f4d74b45-br6ck 0/3 Pending 0 8m
kube-system kube-proxy-sh9lg 1/1 Running 0 8m
kube-system kube-proxy-zwdt5 1/1 Running 0 7m
kube-system kube-scheduler-ubuntu1 1/1 Running 0 8m
root#ubuntu1:~# kubectl --namespace=kube-system describe pod kube-dns-86f4d74b45-br6ck
Name: kube-dns-86f4d74b45-br6ck
Namespace: kube-system
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=4290830601
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/kube-dns-86f4d74b45
Containers:
kubedns:
Image: k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.8
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-dir=/kube-dns-config
--v=2
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/kube-dns-config from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-4fjt4 (ro)
dnsmasq:
Image: k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.8
Ports: 53/UDP, 53/TCP
Host Ports: 0/UDP, 0/TCP
Args:
-v=2
-logtostderr
-configDir=/etc/k8s/dns/dnsmasq-nanny
-restartDnsmasq=true
--
-k
--cache-size=1000
--no-negcache
--log-facility=-
--server=/cluster.local/127.0.0.1#10053
--server=/in-addr.arpa/127.0.0.1#10053
--server=/ip6.arpa/127.0.0.1#10053
Requests:
cpu: 150m
memory: 20Mi
Liveness: http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-4fjt4 (ro)
sidecar:
Image: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.8
Port: 10054/TCP
Host Port: 0/TCP
Args:
--v=2
--logtostderr
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
Requests:
cpu: 10m
memory: 20Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-4fjt4 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-dns-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-dns
Optional: true
kube-dns-token-4fjt4:
Type: Secret (a volume populated by a Secret)
SecretName: kube-dns-token-4fjt4
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6m (x7 over 7m) default-scheduler 0/1 nodes are available: 1 node(s) were not ready.
Warning FailedScheduling 3s (x19 over 6m) default-scheduler 0/2 nodes are available: 2 node(s) were not ready.
Can anyone just help me how to deconstruct this and find the actual issue??
Any help would be off great use
Thanks in advance.

In addition to what #justcompile has wrote you will need a minimum of 2 CPU cores in order to run all pods from the kube-system namespace without issues.
You need to verify how much resources you have on that box and compare it with CPU reservations which each of Pods make.
For example in the provided by you output I can see that your DNS service tries to make a reservetion for 10% of CPU core:
Requests:
cpu: 100m
You can check each of deployed pods and their CPU reservations using:
kubectl describe pods --namespace=kube-system

in your cause kubectl get pods --all-namespaces output cannot see any about pods network.
so you may choice a network implementation and have to install a Pod Network before then kube-dns may deployed fully. for detail kube-dns is stuck in the Pending state and install pod network solution

Firstly, if you run kubectl get nodes does this show both/all nodes in a Ready state?
If they are, I faced this problem and found that when inspecting kubectl get events it showed that the pods were failing as they required a minimum of 2 CPUs to run.
As I was initially running this on an old Macbook Pro via VirtualBox I had to give up and use AWS (other Cloud Platforms are of course available) in order to get multiple CPUs per node.

Related

Why do I keep getting error "5 pod has unbound immediate PersistentVolumeClaims"?

I am following the book Kubernetes for developers and seems maybe book is heavily outdated now.
Recently I have been trying to get prometheus up and running on kubernetes following the instruction from book. That suggested to install and use HELM to get Prometheus and grafana up and running.
helm install monitor stable/prometheus --namespace monitoring
this resulted:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
monitor-kube-state-metrics-578cdbb5b7-pdjzw 0/1 CrashLoopBackOff 14 36m 192.168.23.1 kube-worker-vm3 <none> <none>
monitor-prometheus-alertmanager-7b4c476678-gr4s6 0/2 Pending 0 35m <none> <none> <none> <none>
monitor-prometheus-node-exporter-5kz8x 1/1 Running 0 14h 192.168.1.13 rockpro64 <none> <none>
monitor-prometheus-node-exporter-jjrjh 1/1 Running 1 14h 192.168.1.35 osboxes <none> <none>
monitor-prometheus-node-exporter-k62fn 1/1 Running 1 14h 192.168.1.37 kube-worker-vm3 <none> <none>
monitor-prometheus-node-exporter-wcg2k 1/1 Running 1 14h 192.168.1.36 kube-worker-vm2 <none> <none>
monitor-prometheus-pushgateway-6898f8475b-sk4dz 1/1 Running 0 36m 192.168.90.200 osboxes <none> <none>
monitor-prometheus-server-74d7dc5d4c-vlqmm 0/2 Pending 0 14h <none> <none> <none
For the prometheus server I checked why is it Pending:
# kubectl describe pod monitor-prometheus-server-74d7dc5d4c-vlqmm -n monitoring
Name: monitor-prometheus-server-74d7dc5d4c-vlqmm
Namespace: monitoring
Priority: 0
Node: <none>
Labels: app=prometheus
chart=prometheus-13.8.0
component=server
heritage=Helm
pod-template-hash=74d7dc5d4c
release=monitor
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/monitor-prometheus-server-74d7dc5d4c
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.4.0
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://127.0.0.1:9090/-/reload
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from monitor-prometheus-server-token-n49ls (ro)
prometheus-server:
Image: prom/prometheus:v2.20.1
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention.time=15d
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=15s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from monitor-prometheus-server-token-n49ls (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: monitor-prometheus-server
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: monitor-prometheus-server
ReadOnly: false
monitor-prometheus-server-token-n49ls:
Type: Secret (a volume populated by a Secret)
SecretName: monitor-prometheus-server-token-n49ls
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28m (x734 over 14h) default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 3m5s (x23 over 24m) default-scheduler 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims.
r
However this message I am seeing 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims. is coming with all other nodejs's StatefulSets and rabbitmq Deployments I have tried created. for rabbitmq and nodejs I figured out I need to create a PersistantVolume and a storage class whose name I needed to specify in the PV and PVC. and then it all worked but now I have Prometheus Server, Do I have to do the same for prometheus as well ? why is it not instructed by the HELM ?
Has something change in the Kubernetes API recently ? that I always have to create a PV and Storage Class explicitly for a PVC ?
Unless you configure your cluster with dynamic volume provisioning , you will have to make the PV manually each time. Even if you are not on a cloud, you can setup dynamic storage providers. There are a number of options for providers and you can find many here. Ceph and minio are popular providers.

Kubernetes ingress controller - Error: ImagePullBackOff

I'm unable to get the controller working. Tried many times and still I get Error: ImagePullBackOff.
Is there a alternative that I can try or any idea why its failing?
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.27.0/deploy/static/mandatory.yaml
kubectl describe pod nginx-ingress-controller-7fcb6cffc5-m8m5c -n ingress-nginx
Name: nginx-ingress-controller-7fcb6cffc5-m8m5c
Namespace: ingress-nginx
Priority: 0
Node: ip-10-0-0-244.ap-south-1.compute.internal/10.0.0.244
Start Time: Mon, 07 Dec 2020 08:21:13 -0500
Labels: app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
pod-template-hash=7fcb6cffc5
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu, memory request for container nginx-ingress-controller
kubernetes.io/psp: eks.privileged
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Pending
IP: 10.0.0.231
IPs:
IP: 10.0.0.231
Controlled By: ReplicaSet/nginx-ingress-controller-7fcb6cffc5
Containers:
nginx-ingress-controller:
Container ID:
Image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master
Image ID:
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--publish-service=$(POD_NAMESPACE)/ingress-nginx
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=0s timeout=10s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-7fcb6cffc5-m8m5c (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-xtnz9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-xtnz9:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-xtnz9
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19s default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-7fcb6cffc5-m8m5c to ip-10-0-0-244.ap-south-1.compute.internal
Normal Pulling 18s kubelet Pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master"
Warning Failed 3s kubelet Failed to pull image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 3s kubelet Error: ErrImagePull
Normal BackOff 3s kubelet Back-off pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master"
Warning Failed 3s kubelet Error: ImagePullBackOff
I had the same problem, with the ingress-nginx installation.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/cloud/deploy.yaml
For some reason it couldn't get the ingress-nginx-controller.
$ kubectl get pods --namespace=ingress-nginx
NAME READY STATUS RE
ingress-nginx-admission-create-6q4wx 0/1 Completed 0
ingress-nginx-admission-patch-fr5ct 0/1 Completed 1
ingress-nginx-controller-686556747b-dg68h 0/1 ImagePullBackOff 0
What I did was, I ran $ kubectl describe pod ingress-nginx-controller-686556747b-dg68h --namespace ingress-nginx
and got the following output:
Name: ingress-nginx-controller-686556747b-dg68h
Namespace: ingress-nginx
Priority: 0
Node: docker-desktop/x.x.x.x
Start Time: Wed, 11 May 2022 20:11:55 +0430
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
pod-template-hash=686556747b
Annotations: <none>
Status: Pending
IP: x.x.x.x
IPs:
IP: x.x.x.x
Controlled By: ReplicaSet/ingress-nginx-controller-686556747b
Containers:
controller:
Container ID:
Image: k8s.gcr.io/ingress-nginx/controller:v1.2.0#sha256:d819
Image ID:
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s perio
Readiness: http-get http://:10254/healthz delay=10s timeout=1s perio
Environment:
POD_NAME: ingress-nginx-controller-686556747b-dg68h (v1:metad
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
From Containers.controller.Image, I got the image name that kubernetes is trying to download but is unsuccessful to do so and tried to docker pull that image myself like so:
docker pull k8s.gcr.io/ingress-nginx/controller:v1.2.0#sha256:d819
Docker could pull the image successfully and after that everything worked just fine.
It's failing because kubernetes cannot download the specified image. Check the events section
Warning Failed 3s kubelet Failed to pull image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Maybe you dont have internet connectivity or this image does not exist. You can try running docker pull quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master from your computer
As mentioned by John, creating a nat router and nat config allowed docker images to be pulled when I was facing the same issue. If you create a vpc native GKE cluster which is private it by default has no access to the internet. Unless you deploy a NAT router.
gcloud compute routers create nat-router \
--network my-vpc \
--region us-east4
gcloud compute routers nats create nat-config \
--router-region us-east4 \
--router nat-router \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-service
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/use-regex: 'true'
spec:
rules:
- host: your host name
http:
paths:
- backend:
service:
name: your service name
port:
number: 3000
path: /api/?(.*)
pathType: Prefix
I think this YAML file solves your problem. I faced the same issue.

How to run Tiller on Kubernetes cluster on AWS EKS

I created EKS Kubernetes cluster with terraform. It all went fine, cluster is created and there is one EC2 machine on it. However, I can't init helm and install Tiller there. All the code is on https://github.com/amorfis/aws-eks-terraform
As stated in README.md, after cluster creation I update ~/.kube/config, create rbac, and try to init helm. However, it's pod is still pending:
$> kubectl --namespace kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-7554568866-8mnsm 0/1 Pending 0 3h
coredns-7554568866-mng65 0/1 Pending 0 3h
tiller-deploy-77c96688d7-87rb8 0/1 Pending 0 1h
As well as other 2 coredns pods.
What am i missing?
UPDATE: Output of describe:
$> kubectl describe pod tiller-deploy-77c96688d7-87rb8 --namespace kube-system
Name: tiller-deploy-77c96688d7-87rb8
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: app=helm
name=tiller
pod-template-hash=3375224483
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/tiller-deploy-77c96688d7
Containers:
tiller:
Image: gcr.io/kubernetes-helm/tiller:v2.12.2
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from tiller-token-b9x6d (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
tiller-token-b9x6d:
Type: Secret (a volume populated by a Secret)
SecretName: tiller-token-b9x6d
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
try to allow the master to run pods
according to this issue issue form githup
kubectl taint nodes --all node-role.kubernetes.io/master-

Kubernetes reports "pod didn't trigger scale-up (it wouldn't fit if a new node is added)" even though it would?

I don't understand why I'm receiving this error. A new node should definitely be able to accommodate the pod. As I'm only requesting 768Mi of memory and 450m of CPU, and the instance group that would be autoscaled is of type n1-highcpu-2 - 2 vCPU, 1.8GB.
How could I diagnose this further?
kubectl describe pod:
Name: initial-projectinitialabcrad-697b74b449-848bl
Namespace: production
Node: <none>
Labels: app=initial-projectinitialabcrad
appType=abcrad-api
pod-template-hash=2536306005
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/initial-projectinitialabcrad-697b74b449
Containers:
app:
Image: gcr.io/example-project-abcsub/projectinitial-abcrad-app:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
Port: <none>
Host Port: <none>
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 250m
memory: 512Mi
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
nginx:
Image: gcr.io/example-project-abcsub/projectinitial-abcrad-nginx:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
Port: 80/TCP
Host Port: 0/TCP
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:80/api/v1/ping delay=5s timeout=10s period=10s #success=1 #failure=3
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
cloudsql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.11
Port: 3306/TCP
Host Port: 0/TCP
Command:
/cloud_sql_proxy
-instances=example-project-abcsub:us-central1:abcfn-staging=tcp:0.0.0.0:3306
-credential_file=/secrets/cloudsql/credentials.json
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Mounts:
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
default-token-srv8k:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-srv8k
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 4m (x29706 over 3d) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
Warning FailedScheduling 4m (x18965 over 3d) default-scheduler 0/4 nodes are available: 3 Insufficient memory, 4 Insufficient cpu.
It's not the hardware requests (confusingly the error message made me assume this) but it's due to my pod affinity rule defined:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: appType
operator: NotIn
values:
- example-api
topologyKey: kubernetes.io/hostname
If you are using K8s from a cloud provider like GKE/EKS, maybe it is worth taking a look at the cloud provider resource quota!
Even everything looks reasonable, K8s gave the same error "pod didn't trigger scale-up"! And that was because the CPU quota was exhausted! (K8s has nothing to do with that limitation, so the error is not clear from K8s side).
Running on AWS EKS, I received the same message when the Auto Scaling Group that was configured for the pods (with a taint and toleration) reached the maximum capacity. Enlarging the max capacity caused new nodes to be created and the pods to start running.

k8s Prometheus:pod has unbound PersistentVolumeClaims

I install kube1.10.3 in two virtualbox(centos 7.4) in my win10 machine. I use git clone to get prometheus yaml files.
git clone https://github.com/kubernetes/kubernetes
Then I enter kubernetes/cluster/addons/prometheus annd follow this order to create pods:
alertmanager-configmap.yaml
alertmanager-pvc.yaml
alertmanager-deployment.yaml
alertmanager-service.yaml
kube-state-metrics-rbac.yaml
kube-state-metrics-deployment.yaml
kube-state-metrics-service.yaml
node-exporter-ds.yml
node-exporter-service.yaml
prometheus-configmap.yaml
prometheus-rbac.yaml
prometheus-statefulset.yaml
prometheus-service.yaml
But Prometheus and alertmanage are in pending state:
kube-system alertmanager-6bd9584b85-j4h5m 0/2 Pending 0 9m
kube-system calico-etcd-pnwtr 1/1 Running 0 16m
kube-system calico-kube-controllers-5d74847676-mjq4j 1/1 Running 0 16m
kube-system calico-node-59xfk 2/2 Running 1 16m
kube-system calico-node-rqsh5 2/2 Running 1 16m
kube-system coredns-7997f8864c-ckhsq 1/1 Running 0 16m
kube-system coredns-7997f8864c-jjtvq 1/1 Running 0 16m
kube-system etcd-master16g 1/1 Running 0 15m
kube-system heapster-589b7db6c9-mpmks 1/1 Running 0 16m
kube-system kube-apiserver-master16g 1/1 Running 0 15m
kube-system kube-controller-manager-master16g 1/1 Running 0 15m
kube-system kube-proxy-hqq49 1/1 Running 0 16m
kube-system kube-proxy-l8hmh 1/1 Running 0 16m
kube-system kube-scheduler-master16g 1/1 Running 0 16m
kube-system kube-state-metrics-8595f97c4-g6x5x 2/2 Running 0 8m
kube-system kubernetes-dashboard-7d5dcdb6d9-944xl 1/1 Running 0 16m
kube-system monitoring-grafana-7b767fb8dd-mg6dd 1/1 Running 0 16m
kube-system monitoring-influxdb-54bd58b4c9-z9tgd 1/1 Running 0 16m
kube-system node-exporter-f6pmw 1/1 Running 0 8m
kube-system node-exporter-zsd9b 1/1 Running 0 8m
kube-system prometheus-0 0/2 Pending 0 7m
I checked prometheus pod by command shown below:
[root#master16g prometheus]# kubectl describe pod prometheus-0 -n kube-system
Name: prometheus-0
Namespace: kube-system
Node: <none>
Labels: controller-revision-hash=prometheus-8fc558cb5
k8s-app=prometheus
statefulset.kubernetes.io/pod-name=prometheus-0
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Pending
IP:
Controlled By: StatefulSet/prometheus
Init Containers:
init-chown-data:
Image: busybox:latest
Port: <none>
Host Port: <none>
Command:
chown
-R
65534:65534
/data
Environment: <none>
Mounts:
/data from prometheus-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-f6v42 (ro)
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.1
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9090/-/reload
Limits:
cpu: 10m
memory: 10Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-f6v42 (ro)
prometheus-server:
Image: prom/prometheus:v2.2.1
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
Limits:
cpu: 200m
memory: 1000Mi
Requests:
cpu: 200m
memory: 1000Mi
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from prometheus-data (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-f6v42 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
prometheus-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-data-prometheus-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-config
Optional: false
prometheus-token-f6v42:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-f6v42
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 42s (x22 over 5m) default-scheduler pod has unbound PersistentVolumeClaims (repeated 2 times)
In the last line, it shows warning message: pod has unbound PersistentVolumeClaims (repeated 2 times)
The Prometheus logs says:
[root#master16g prometheus]# kubectl logs prometheus-0 -n kube-system
Error from server (BadRequest): a container name must be specified for pod prometheus-0, choose one of: [prometheus-server-configmap-reload prometheus-server] or one of the init containers: [init-chown-data]
The I describe alertmanager pod and its logs:
[root#master16g prometheus]# kubectl describe pod alertmanager-6bd9584b85-j4h5m -n kube-system
Name: alertmanager-6bd9584b85-j4h5m
Namespace: kube-system
Node: <none>
Labels: k8s-app=alertmanager
pod-template-hash=2685140641
version=v0.14.0
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Pending
IP:
Controlled By: ReplicaSet/alertmanager-6bd9584b85
Containers:
prometheus-alertmanager:
Image: prom/alertmanager:v0.14.0
Port: 9093/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
--web.external-url=/
Limits:
cpu: 10m
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Readiness: http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-snfrt (ro)
prometheus-alertmanager-configmap-reload:
Image: jimmidyson/configmap-reload:v0.1
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9093/-/reload
Limits:
cpu: 10m
memory: 10Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-snfrt (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: alertmanager-config
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: alertmanager
ReadOnly: false
default-token-snfrt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-snfrt
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x26 over 9m) default-scheduler pod has unbound PersistentVolumeClaims (repeated 2 times)
And its log:
[root#master16g prometheus]# kubectl logs alertmanager-6bd9584b85-j4h5m -n kube-system
Error from server (BadRequest): a container name must be specified for pod alertmanager-6bd9584b85-j4h5m, choose one of: [prometheus-alertmanager prometheus-alertmanager-configmap-reload]
It has same warning message as Prometheus:
pod has unbound PersistentVolumeClaims (repeated 2 times)
Then I get pvc by issuing command as follows:
[root#master16g prometheus]# kubectl get pvc --all-namespaces
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
kube-system alertmanager Pending standard 20m
kube-system prometheus-data-prometheus-0 Pending standard 19m
My question is how to make bound persistentVolumnClaim? Why log says container name must be specified?
===============================================================
Second edition
Since pvc file defined storage class, so I need to define a storage class yaml. How to do it if I want Nfs or GlusterFs? In this way, I could avoid cloud vendor, like Google or AWS.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
This log entry:
Error from server (BadRequest): a container name must be specified for pod alertmanager-6bd9584b85-j4h5m, choose one of: [prometheus-alertmanager prometheus-alertmanager-configmap-reload]
means Pod alertmanager-6bd9584b85-j4h5m consists of two containers:
prometheus-alertmanager
prometheus-alertmanager-configmap-reload
When you use kubectl logs for Pod which consists of more then one containers you must specify a name of the container to view its logs. Command template:
kubectl -n <namespace> logs <pod_name> <container_name>
For example, if you want to view logs of the container prometheus-alertmanager which is a part of Pod alertmanager-6bd9584b85-j4h5m in the namespace kube-system you should use this command:
kubectl -n kube-system logs alertmanager-6bd9584b85-j4h5m prometheus-alertmanager
Pending status of the PVCs could mean you have no corresponding PVs