I created EKS Kubernetes cluster with terraform. It all went fine, cluster is created and there is one EC2 machine on it. However, I can't init helm and install Tiller there. All the code is on https://github.com/amorfis/aws-eks-terraform
As stated in README.md, after cluster creation I update ~/.kube/config, create rbac, and try to init helm. However, it's pod is still pending:
$> kubectl --namespace kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-7554568866-8mnsm 0/1 Pending 0 3h
coredns-7554568866-mng65 0/1 Pending 0 3h
tiller-deploy-77c96688d7-87rb8 0/1 Pending 0 1h
As well as other 2 coredns pods.
What am i missing?
UPDATE: Output of describe:
$> kubectl describe pod tiller-deploy-77c96688d7-87rb8 --namespace kube-system
Name: tiller-deploy-77c96688d7-87rb8
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: app=helm
name=tiller
pod-template-hash=3375224483
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/tiller-deploy-77c96688d7
Containers:
tiller:
Image: gcr.io/kubernetes-helm/tiller:v2.12.2
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from tiller-token-b9x6d (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
tiller-token-b9x6d:
Type: Secret (a volume populated by a Secret)
SecretName: tiller-token-b9x6d
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
try to allow the master to run pods
according to this issue issue form githup
kubectl taint nodes --all node-role.kubernetes.io/master-
Related
Environmental Info:
K3s Version:
k3s version v1.24.3+k3s1 (990ba0e8)
go version go1.18.1
Node(s) CPU architecture, OS, and Version:
Five RPI 4s Running Headless 64-bit Raspbian, each with following information
Linux 5.15.56-v8+ #1575 SMP PREEMPT Fri Jul 22 20:31:26 BST 2022 aarch64 GNU/Linux
Cluster Configuration:
3 Nodes configured as control plane, 2 Nodes as Worker Nodes
Describe the bug:
The Pods: coredns-b96499967-ktgtc, local-path-provisioner-7b7dc8d6f5-5cfds, metrics-server-668d979685-9szb9, traefik-7cd4fcff68-gfmhm, and svclb-traefik-aa9f6b38-j27sw are at status unknown, with 0/1 pods ready. What this means is that the Cluster DNS service does not work and therefore that pods not are not able to resolve internal or external names
Steps To Reproduce:
Installed K3s in HA mode using following instructions: https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/
Expected behavior:
The Important pods should be running, wit known status. Additionally, DNS should work, which means that, among other things headless services should work, and pods should be able to resolve hostnames inside and outside the cluster
Actual behavior:
DNS Pods Should be running with a known state, Pods should be able to resolve hostnames inside and outside the cluster, and headless services should be able to work
Additional context / logs:
kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
hosts /etc/coredns/NodeHosts {
ttl 60
reload 15s
fallthrough
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
import /etc/coredns/custom/*.server
Description of Relevant Pods:
kubectl describe pods --namespace=kube-system
Name: coredns-b96499967-ktgtc
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:09:38 +0100
Labels: k8s-app=kube-dns
pod-template-hash=b96499967
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-b96499967
Containers:
coredns:
Container ID: containerd://1a83a59275abdb7b783aa06eb56cb1e5367c1ca196598851c2b7d5154c0a4bb9
Image: rancher/mirrored-coredns-coredns:1.9.1
Image ID: docker.io/rancher/mirrored-coredns-coredns#sha256:35e38f3165a19cb18c65d83334c13d61db6b24905f45640aa8c2d2a6f55ebcb0
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/etc/coredns/custom from custom-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zbbxf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
custom-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns-custom
Optional: true
kube-api-access-zbbxf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x419 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11421 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m24s (x139 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Name: metrics-server-668d979685-9szb9
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:09:38 +0100
Labels: k8s-app=metrics-server
pod-template-hash=668d979685
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/metrics-server-668d979685
Containers:
metrics-server:
Container ID: containerd://cd02643f7d7bc78ea98abdec20558626cfac39f70e1127b2281342dd00905e44
Image: rancher/mirrored-metrics-server:v0.5.2
Image ID: docker.io/rancher/mirrored-metrics-server#sha256:48ecad4fe641a09fa4459f93c7ad29d4916f6b9cf7e934d548f1d8eff96e2f35
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get https://:https/livez delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-djqgk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-djqgk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x418 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11427 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m27s (x141 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Name: traefik-7cd4fcff68-gfmhm
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:10:43 +0100
Labels: app.kubernetes.io/instance=traefik
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-10.19.300
pod-template-hash=7cd4fcff68
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 9100
prometheus.io/scrape: true
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/traefik-7cd4fcff68
Containers:
traefik:
Container ID: containerd://779a1596fb204a7577acda97e9fb3f4c5728cf1655071d8e5faad6a8d407d217
Image: rancher/mirrored-library-traefik:2.6.2
Image ID: docker.io/rancher/mirrored-library-traefik#sha256:ad2226527eea71b7591d5e9dcc0bffd0e71b2235420c34f358de6db6d529561f
Ports: 9100/TCP, 9000/TCP, 8000/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
--global.checknewversion
--global.sendanonymoususage
--entrypoints.metrics.address=:9100/tcp
--entrypoints.traefik.address=:9000/tcp
--entrypoints.web.address=:8000/tcp
--entrypoints.websecure.address=:8443/tcp
--api.dashboard=true
--ping=true
--metrics.prometheus=true
--metrics.prometheus.entrypoint=metrics
--providers.kubernetescrd
--providers.kubernetesingress
--providers.kubernetesingress.ingressendpoint.publishedservice=kube-system/traefik
--entrypoints.websecure.http.tls=true
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Liveness: http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=3
Readiness: http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=1
Environment: <none>
Mounts:
/data from data (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw4qc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-jw4qc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x415 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11418 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m30s (x141 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
The Solution that I found to resolve the problem - at least for now, was to manually restart all of the kube-system deployments found using the command deployments
kubectl get deployments --namespace=kube-system
If all of them are similarly not ready they can be restarted using the command
kubectl -n kube-system rollout restart <deployment>
Specifically, coredns, local-path-provisioner, metrics-server, and traefik deployments all needed to be restarted
I am following the book Kubernetes for developers and seems maybe book is heavily outdated now.
Recently I have been trying to get prometheus up and running on kubernetes following the instruction from book. That suggested to install and use HELM to get Prometheus and grafana up and running.
helm install monitor stable/prometheus --namespace monitoring
this resulted:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
monitor-kube-state-metrics-578cdbb5b7-pdjzw 0/1 CrashLoopBackOff 14 36m 192.168.23.1 kube-worker-vm3 <none> <none>
monitor-prometheus-alertmanager-7b4c476678-gr4s6 0/2 Pending 0 35m <none> <none> <none> <none>
monitor-prometheus-node-exporter-5kz8x 1/1 Running 0 14h 192.168.1.13 rockpro64 <none> <none>
monitor-prometheus-node-exporter-jjrjh 1/1 Running 1 14h 192.168.1.35 osboxes <none> <none>
monitor-prometheus-node-exporter-k62fn 1/1 Running 1 14h 192.168.1.37 kube-worker-vm3 <none> <none>
monitor-prometheus-node-exporter-wcg2k 1/1 Running 1 14h 192.168.1.36 kube-worker-vm2 <none> <none>
monitor-prometheus-pushgateway-6898f8475b-sk4dz 1/1 Running 0 36m 192.168.90.200 osboxes <none> <none>
monitor-prometheus-server-74d7dc5d4c-vlqmm 0/2 Pending 0 14h <none> <none> <none
For the prometheus server I checked why is it Pending:
# kubectl describe pod monitor-prometheus-server-74d7dc5d4c-vlqmm -n monitoring
Name: monitor-prometheus-server-74d7dc5d4c-vlqmm
Namespace: monitoring
Priority: 0
Node: <none>
Labels: app=prometheus
chart=prometheus-13.8.0
component=server
heritage=Helm
pod-template-hash=74d7dc5d4c
release=monitor
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/monitor-prometheus-server-74d7dc5d4c
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.4.0
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://127.0.0.1:9090/-/reload
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from monitor-prometheus-server-token-n49ls (ro)
prometheus-server:
Image: prom/prometheus:v2.20.1
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention.time=15d
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=15s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from monitor-prometheus-server-token-n49ls (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: monitor-prometheus-server
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: monitor-prometheus-server
ReadOnly: false
monitor-prometheus-server-token-n49ls:
Type: Secret (a volume populated by a Secret)
SecretName: monitor-prometheus-server-token-n49ls
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28m (x734 over 14h) default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 3m5s (x23 over 24m) default-scheduler 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims.
r
However this message I am seeing 0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims. is coming with all other nodejs's StatefulSets and rabbitmq Deployments I have tried created. for rabbitmq and nodejs I figured out I need to create a PersistantVolume and a storage class whose name I needed to specify in the PV and PVC. and then it all worked but now I have Prometheus Server, Do I have to do the same for prometheus as well ? why is it not instructed by the HELM ?
Has something change in the Kubernetes API recently ? that I always have to create a PV and Storage Class explicitly for a PVC ?
Unless you configure your cluster with dynamic volume provisioning , you will have to make the PV manually each time. Even if you are not on a cloud, you can setup dynamic storage providers. There are a number of options for providers and you can find many here. Ceph and minio are popular providers.
I created a cluster on digitalocean using Kubeadm and 3 droplets. Since this is not a managed Kubernetes cluster from Digital ocean, how do I manually setup a LoadBalancer ?
I've tried adding an external load balancer by adding the following lines to a deployment config file
...
replicaCount: 1
image:
repository: turfff/node-replicas
tag: latest
pullPolicy: IfNotPresent
...
service:
type: LoadBalancer
port: 80
targetPort: 8080
...
however, when I run the configuration and check for created svc
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 13d
mongo-mongodb-replicaset ClusterIP None <none> 27017/TCP 3h15m
mongo-mongodb-replicaset-client ClusterIP None <none> 27017/TCP 3h15m
nodejs-nodeapp LoadBalancer 10.109.213.98 <pending> 80:31769/TCP 61m
kubectl describe svc nodejs-nodeapp
Name: nodejs-nodeapp
Namespace: default
Labels: app.kubernetes.io/instance=nodejs
app.kubernetes.io/managed-by=Tiller
app.kubernetes.io/name=nodeapp
app.kubernetes.io/version=1.0
helm.sh/chart=nodeapp-0.1.0
Annotations: <none>
Selector: app.kubernetes.io/instance=nodejs,app.kubernetes.io/name=nodeapp
Type: LoadBalancer
IP: 10.109.213.98
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 31769/TCP
Endpoints: 10.244.2.19:8080
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
kubectl get pods
NAME READY STATUS RESTARTS AGE
mongo-mongodb-replicaset-0 1/1 Running 0 3h18m
mongo-mongodb-replicaset-1 1/1 Running 0 3h17m
mongo-mongodb-replicaset-2 1/1 Running 0 3h16m
nodejs-nodeapp-7b89db8888-sjcbq 1/1 Running 0 65m
kubectl describe pod nodejs-nodeapp
Name: nodejs-nodeapp-7b89db8888-sjcbq
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: worker-02/206.81.3.65
Start Time: Sun, 14 Jun 2020 11:21:07 +0100
Labels: app.kubernetes.io/instance=nodejs
app.kubernetes.io/name=nodeapp
pod-template-hash=7b89db8888
Annotations: <none>
Status: Running
IP: 10.244.2.19
Controlled By: ReplicaSet/nodejs-nodeapp-7b89db8888
Containers:
nodeapp:
Container ID: docker://f0d4d01f....
Image: turfff/node-replicas:latest
Image ID: docker-pullable://turfff/node-replicas#sha256:34d...
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 14 Jun 2020 11:21:08 +0100
Ready: True
Restart Count: 0
Liveness: http-get http://:http/sharks delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/sharks delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
MONGO_USERNAME: <set to the key 'MONGO_USERNAME' in secret 'nodejs-auth'> Optional: false
MONGO_PASSWORD: <set to the key 'MONGO_PASSWORD' in secret 'nodejs-auth'> Optional: false
MONGO_HOSTNAME: <set to the key 'MONGO_HOSTNAME' of config map 'nodejs-config'> Optional: false
MONGO_PORT: <set to the key 'MONGO_PORT' of config map 'nodejs-config'> Optional: false
MONGO_DB: <set to the key 'MONGO_DB' of config map 'nodejs-config'> Optional: false
MONGO_REPLICASET: <set to the key 'MONGO_REPLICASET' of config map 'nodejs-config'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nodejs-nodeapp-token-4wxvd (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
nodejs-nodeapp-token-4wxvd:
Type: Secret (a volume populated by a Secret)
SecretName: nodejs-nodeapp-token-4wxvd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
It fails to create a loadbalancer. How do I manually setup the LoadBalancer ?
I would not recommend configuring loadbalancers manually. You can automate this if you install digital ocean cloud controller manager which is the Kubernetes cloud controller manager implementation for DigitalOcean. Read more about cloud controller managers here.
DigitalOcean cloud controller manager runs service controller, which is responsible for watching services of type LoadBalancer and creating DO loadbalancers to satisfy its requirements. Here are example of how it's used.
Here is a yaml file that you can use to deploy this on your Kubernetes cluster. This needs a digital ocean api token to be placed in access-token: section of the manifest.
I install kube1.10.3 in two virtualbox(centos 7.4) in my win10 machine. I use git clone to get prometheus yaml files.
git clone https://github.com/kubernetes/kubernetes
Then I enter kubernetes/cluster/addons/prometheus annd follow this order to create pods:
alertmanager-configmap.yaml
alertmanager-pvc.yaml
alertmanager-deployment.yaml
alertmanager-service.yaml
kube-state-metrics-rbac.yaml
kube-state-metrics-deployment.yaml
kube-state-metrics-service.yaml
node-exporter-ds.yml
node-exporter-service.yaml
prometheus-configmap.yaml
prometheus-rbac.yaml
prometheus-statefulset.yaml
prometheus-service.yaml
But Prometheus and alertmanage are in pending state:
kube-system alertmanager-6bd9584b85-j4h5m 0/2 Pending 0 9m
kube-system calico-etcd-pnwtr 1/1 Running 0 16m
kube-system calico-kube-controllers-5d74847676-mjq4j 1/1 Running 0 16m
kube-system calico-node-59xfk 2/2 Running 1 16m
kube-system calico-node-rqsh5 2/2 Running 1 16m
kube-system coredns-7997f8864c-ckhsq 1/1 Running 0 16m
kube-system coredns-7997f8864c-jjtvq 1/1 Running 0 16m
kube-system etcd-master16g 1/1 Running 0 15m
kube-system heapster-589b7db6c9-mpmks 1/1 Running 0 16m
kube-system kube-apiserver-master16g 1/1 Running 0 15m
kube-system kube-controller-manager-master16g 1/1 Running 0 15m
kube-system kube-proxy-hqq49 1/1 Running 0 16m
kube-system kube-proxy-l8hmh 1/1 Running 0 16m
kube-system kube-scheduler-master16g 1/1 Running 0 16m
kube-system kube-state-metrics-8595f97c4-g6x5x 2/2 Running 0 8m
kube-system kubernetes-dashboard-7d5dcdb6d9-944xl 1/1 Running 0 16m
kube-system monitoring-grafana-7b767fb8dd-mg6dd 1/1 Running 0 16m
kube-system monitoring-influxdb-54bd58b4c9-z9tgd 1/1 Running 0 16m
kube-system node-exporter-f6pmw 1/1 Running 0 8m
kube-system node-exporter-zsd9b 1/1 Running 0 8m
kube-system prometheus-0 0/2 Pending 0 7m
I checked prometheus pod by command shown below:
[root#master16g prometheus]# kubectl describe pod prometheus-0 -n kube-system
Name: prometheus-0
Namespace: kube-system
Node: <none>
Labels: controller-revision-hash=prometheus-8fc558cb5
k8s-app=prometheus
statefulset.kubernetes.io/pod-name=prometheus-0
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Pending
IP:
Controlled By: StatefulSet/prometheus
Init Containers:
init-chown-data:
Image: busybox:latest
Port: <none>
Host Port: <none>
Command:
chown
-R
65534:65534
/data
Environment: <none>
Mounts:
/data from prometheus-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-f6v42 (ro)
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.1
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9090/-/reload
Limits:
cpu: 10m
memory: 10Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-f6v42 (ro)
prometheus-server:
Image: prom/prometheus:v2.2.1
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
Limits:
cpu: 200m
memory: 1000Mi
Requests:
cpu: 200m
memory: 1000Mi
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from prometheus-data (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-f6v42 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
prometheus-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-data-prometheus-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-config
Optional: false
prometheus-token-f6v42:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-f6v42
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 42s (x22 over 5m) default-scheduler pod has unbound PersistentVolumeClaims (repeated 2 times)
In the last line, it shows warning message: pod has unbound PersistentVolumeClaims (repeated 2 times)
The Prometheus logs says:
[root#master16g prometheus]# kubectl logs prometheus-0 -n kube-system
Error from server (BadRequest): a container name must be specified for pod prometheus-0, choose one of: [prometheus-server-configmap-reload prometheus-server] or one of the init containers: [init-chown-data]
The I describe alertmanager pod and its logs:
[root#master16g prometheus]# kubectl describe pod alertmanager-6bd9584b85-j4h5m -n kube-system
Name: alertmanager-6bd9584b85-j4h5m
Namespace: kube-system
Node: <none>
Labels: k8s-app=alertmanager
pod-template-hash=2685140641
version=v0.14.0
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Pending
IP:
Controlled By: ReplicaSet/alertmanager-6bd9584b85
Containers:
prometheus-alertmanager:
Image: prom/alertmanager:v0.14.0
Port: 9093/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
--web.external-url=/
Limits:
cpu: 10m
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Readiness: http-get http://:9093/%23/status delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-snfrt (ro)
prometheus-alertmanager-configmap-reload:
Image: jimmidyson/configmap-reload:v0.1
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9093/-/reload
Limits:
cpu: 10m
memory: 10Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-snfrt (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: alertmanager-config
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: alertmanager
ReadOnly: false
default-token-snfrt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-snfrt
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x26 over 9m) default-scheduler pod has unbound PersistentVolumeClaims (repeated 2 times)
And its log:
[root#master16g prometheus]# kubectl logs alertmanager-6bd9584b85-j4h5m -n kube-system
Error from server (BadRequest): a container name must be specified for pod alertmanager-6bd9584b85-j4h5m, choose one of: [prometheus-alertmanager prometheus-alertmanager-configmap-reload]
It has same warning message as Prometheus:
pod has unbound PersistentVolumeClaims (repeated 2 times)
Then I get pvc by issuing command as follows:
[root#master16g prometheus]# kubectl get pvc --all-namespaces
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
kube-system alertmanager Pending standard 20m
kube-system prometheus-data-prometheus-0 Pending standard 19m
My question is how to make bound persistentVolumnClaim? Why log says container name must be specified?
===============================================================
Second edition
Since pvc file defined storage class, so I need to define a storage class yaml. How to do it if I want Nfs or GlusterFs? In this way, I could avoid cloud vendor, like Google or AWS.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
This log entry:
Error from server (BadRequest): a container name must be specified for pod alertmanager-6bd9584b85-j4h5m, choose one of: [prometheus-alertmanager prometheus-alertmanager-configmap-reload]
means Pod alertmanager-6bd9584b85-j4h5m consists of two containers:
prometheus-alertmanager
prometheus-alertmanager-configmap-reload
When you use kubectl logs for Pod which consists of more then one containers you must specify a name of the container to view its logs. Command template:
kubectl -n <namespace> logs <pod_name> <container_name>
For example, if you want to view logs of the container prometheus-alertmanager which is a part of Pod alertmanager-6bd9584b85-j4h5m in the namespace kube-system you should use this command:
kubectl -n kube-system logs alertmanager-6bd9584b85-j4h5m prometheus-alertmanager
Pending status of the PVCs could mean you have no corresponding PVs
I have deployed kubernetes on a virt-manager vm following this link
https://kubernetes.io/docs/setup/independent/install-kubeadm/
When i join my another vm to the cluster i find that the kube-dns is in pending state.
root#ubuntu1:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-ubuntu1 1/1 Running 0 7m
kube-system kube-apiserver-ubuntu1 1/1 Running 0 8m
kube-system kube-controller-manager-ubuntu1 1/1 Running 0 8m
kube-system kube-dns-86f4d74b45-br6ck 0/3 Pending 0 8m
kube-system kube-proxy-sh9lg 1/1 Running 0 8m
kube-system kube-proxy-zwdt5 1/1 Running 0 7m
kube-system kube-scheduler-ubuntu1 1/1 Running 0 8m
root#ubuntu1:~# kubectl --namespace=kube-system describe pod kube-dns-86f4d74b45-br6ck
Name: kube-dns-86f4d74b45-br6ck
Namespace: kube-system
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=4290830601
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/kube-dns-86f4d74b45
Containers:
kubedns:
Image: k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.8
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-dir=/kube-dns-config
--v=2
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/kube-dns-config from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-4fjt4 (ro)
dnsmasq:
Image: k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.8
Ports: 53/UDP, 53/TCP
Host Ports: 0/UDP, 0/TCP
Args:
-v=2
-logtostderr
-configDir=/etc/k8s/dns/dnsmasq-nanny
-restartDnsmasq=true
--
-k
--cache-size=1000
--no-negcache
--log-facility=-
--server=/cluster.local/127.0.0.1#10053
--server=/in-addr.arpa/127.0.0.1#10053
--server=/ip6.arpa/127.0.0.1#10053
Requests:
cpu: 150m
memory: 20Mi
Liveness: http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-4fjt4 (ro)
sidecar:
Image: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.8
Port: 10054/TCP
Host Port: 0/TCP
Args:
--v=2
--logtostderr
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
Requests:
cpu: 10m
memory: 20Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-4fjt4 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-dns-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-dns
Optional: true
kube-dns-token-4fjt4:
Type: Secret (a volume populated by a Secret)
SecretName: kube-dns-token-4fjt4
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6m (x7 over 7m) default-scheduler 0/1 nodes are available: 1 node(s) were not ready.
Warning FailedScheduling 3s (x19 over 6m) default-scheduler 0/2 nodes are available: 2 node(s) were not ready.
Can anyone just help me how to deconstruct this and find the actual issue??
Any help would be off great use
Thanks in advance.
In addition to what #justcompile has wrote you will need a minimum of 2 CPU cores in order to run all pods from the kube-system namespace without issues.
You need to verify how much resources you have on that box and compare it with CPU reservations which each of Pods make.
For example in the provided by you output I can see that your DNS service tries to make a reservetion for 10% of CPU core:
Requests:
cpu: 100m
You can check each of deployed pods and their CPU reservations using:
kubectl describe pods --namespace=kube-system
in your cause kubectl get pods --all-namespaces output cannot see any about pods network.
so you may choice a network implementation and have to install a Pod Network before then kube-dns may deployed fully. for detail kube-dns is stuck in the Pending state and install pod network solution
Firstly, if you run kubectl get nodes does this show both/all nodes in a Ready state?
If they are, I faced this problem and found that when inspecting kubectl get events it showed that the pods were failing as they required a minimum of 2 CPUs to run.
As I was initially running this on an old Macbook Pro via VirtualBox I had to give up and use AWS (other Cloud Platforms are of course available) in order to get multiple CPUs per node.