Kubernetes Node status ready but can not be seen by scheduler - kubernetes

I've set up a Kubernetes cluster with three nodes, i get all my nodes status ready, but the scheduler seems not find one of them. How could this happen.
[root#master1 app]# kubectl get nodes
NAME LABELS STATUS AGE
172.16.0.44 kubernetes.io/hostname=172.16.0.44,pxc=node1 Ready 8d
172.16.0.45 kubernetes.io/hostname=172.16.0.45 Ready 8d
172.16.0.46 kubernetes.io/hostname=172.16.0.46 Ready 8d
I use nodeSelect in my RC file like thie:
nodeSelector:
pxc: node1
describe the rc:
Name: mongo-controller
Namespace: kube-system
Image(s): mongo
Selector: k8s-app=mongo
Labels: k8s-app=mongo
Replicas: 1 current / 1 desired
Pods Status: 0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Volumes:
mongo-persistent-storage:
Type: HostPath (bare host directory volume)
Path: /k8s/mongodb
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
25m 25m 1 {replication-controller } SuccessfulCreate Created pod: mongo-controller-0wpwu
get pods to be pending:
[root#master1 app]# kubectl get pods mongo-controller-0wpwu --namespace=kube-system
NAME READY STATUS RESTARTS AGE
mongo-controller-0wpwu 0/1 Pending 0 27m
describe pod mongo-controller-0wpwu:
[root#master1 app]# kubectl describe pod mongo-controller-0wpwu --namespace=kube-system
Name: mongo-controller-0wpwu
Namespace: kube-system
Image(s): mongo
Node: /
Labels: k8s-app=mongo
Status: Pending
Reason:
Message:
IP:
Replication Controllers: mongo-controller (1/1 replicas created)
Containers:
mongo:
Container ID:
Image: mongo
Image ID:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Ready: False
Restart Count: 0
Environment Variables:
Volumes:
mongo-persistent-storage:
Type: HostPath (bare host directory volume)
Path: /k8s/mongodb
default-token-7qjcu:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-7qjcu
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
22m 37s 12 {default-scheduler } FailedScheduling pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.46): MatchNodeSelector
fit failure on node (172.16.0.45): MatchNodeSelector
27m 9s 67 {default-scheduler } FailedScheduling pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
See the ip list in events, The 172.16.0.44 seems not seen by the scheduler? How could the happen?
describe the node 172.16.0.44
[root#master1 app]# kubectl describe nodes --namespace=kube-system
Name: 172.16.0.44
Labels: kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp: Wed, 30 Mar 2016 15:58:47 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Fri, 08 Apr 2016 12:18:01 +0800 Fri, 08 Apr 2016 11:18:52 +0800 KubeletReady kubelet is posting ready status
OutOfDisk Unknown Wed, 30 Mar 2016 15:58:47 +0800 Thu, 07 Apr 2016 17:38:50 +0800 NodeStatusNeverUpdated Kubelet never posted node status.
Addresses: 172.16.0.44,172.16.0.44
Capacity:
cpu: 2
memory: 7748948Ki
pods: 40
System Info:
Machine ID: 45461f76679f48ee96e95da6cc798cc8
System UUID: 2B850D4F-953C-4C20-B182-66E17D5F6461
Boot ID: 40d2cd8d-2e46-4fef-92e1-5fba60f57965
Kernel Version: 3.10.0-123.9.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Container Runtime Version: docker://1.10.1
Kubelet Version: v1.2.0
Kube-Proxy Version: v1.2.0
ExternalID: 172.16.0.44
Non-terminated Pods: (1 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
───────── ──── ──────────── ────────── ─────────────── ─────────────
kube-system kube-registry-proxy-172.16.0.44 100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Allocated resources:
(Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
CPU Requests CPU Limits Memory Requests Memory Limits
──────────── ────────── ─────────────── ─────────────
100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
59m 59m 1 {kubelet 172.16.0.44} Starting Starting kubelet.
Ssh login 44, i get the disk space is free(i also remove some docker images and containers):
[root#iZ25dqhvvd0Z ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 40G 2.6G 35G 7% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
tmpfs 3.7G 143M 3.6G 4% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
/dev/xvdb 40G 361M 37G 1% /k8s
Still docker logs scheduler(v1.3.0-alpha.1 ) get this
E0408 05:28:42.679448 1 factory.go:387] Error scheduling kube-system mongo-controller-0wpwu: pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
; retrying
I0408 05:28:42.679577 1 event.go:216] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"mongo-controller-0wpwu", UID:"2d0f0844-fd3c-11e5-b531-00163e000727", APIVersion:"v1", ResourceVersion:"634139", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector

Thanks for your replay Robert. i got this resolve by doing below:
kubectl delete rc
kubectl delete node 172.16.0.44
stop kubelet in 172.16.0.44
rm -rf /k8s/*
restart kubelet
Now the node is ready, and out of disk is gone.
Name: 172.16.0.44
Labels: kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp: Fri, 08 Apr 2016 15:14:51 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Fri, 08 Apr 2016 15:25:33 +0800 Fri, 08 Apr 2016 15:14:50 +0800 KubeletReady kubelet is posting ready status
Addresses: 172.16.0.44,172.16.0.44
Capacity:
cpu: 2
memory: 7748948Ki
pods: 40
System Info:
Machine ID: 45461f76679f48ee96e95da6cc798cc8
System UUID: 2B850D4F-953C-4C20-B182-66E17D5F6461
Boot ID: 40d2cd8d-2e46-4fef-92e1-5fba60f57965
Kernel Version: 3.10.0-123.9.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
I found this https://github.com/kubernetes/kubernetes/issues/4135, but still don't know why my disk space is free and kubelet thinks it is out of disk...

The reason the scheduler failed is that there wasn't space to fit the pod onto the node. If you look at the conditions for your node, it says that the OutOfDisk condition is Unknown. The scheduler is probably not willing to place on pod onto a node that it thinks doesn't have available disk space.

We had the same issue in AWS when they changed DNS from IP=DNS name to IP=IP.eu-central: Nodes showed ready but where not reachable via their name.

Related

Using GPU with Kubernetes GKE and Node auto-provisioning

I try to do something fairly simple: To run a GPU machine in a k8s cluster using auto-provisioning. When deploying the Pod with a limits: nvidia.com/gpu specification the auto-provisioning is correctly creating a node-pool and scaling up an appropriate node. However, the Pod stays at Pending with the following message:
Warning FailedScheduling 59s (x5 over 2m46s) default-scheduler 0/10 nodes are available: 10 Insufficient nvidia.com/gpu.
It seems like taints and tolerations are added correctly by gke. It just doesnt scale up.
Ive followed the instructions here:
https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers
To reproduce:
Create a new cluster in a zone with auto-provisioning that includes gpu (I have replaced my own project name with MYPROJECT). This command is what comes out of the console when these changes are done:
gcloud beta container --project "MYPROJECT" clusters create "cluster-2" --zone "europe-west4-a" --no-enable-basic-auth --cluster-version "1.18.12-gke.1210" --release-channel "regular" --machine-type "e2-medium" --image-type "COS" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes "1" --enable-stackdriver-kubernetes --enable-ip-alias --network "projects/MYPROJECT/global/networks/default" --subnetwork "projects/MYPROJECT/regions/europe-west4/subnetworks/default" --default-max-pods-per-node "110" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-autoprovisioning --min-cpu 1 --max-cpu 20 --min-memory 1 --max-memory 50 --max-accelerator type="nvidia-tesla-p100",count=1 --enable-autoprovisioning-autorepair --enable-autoprovisioning-autoupgrade --autoprovisioning-max-surge-upgrade 1 --autoprovisioning-max-unavailable-upgrade 0 --enable-vertical-pod-autoscaling --enable-shielded-nodes --node-locations "europe-west4-a"
Install NVIDIA drivers by installing DaemonSet:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
Deploy pod that requests GPU:
my-gpu-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: my-gpu-pod
spec:
containers:
- name: my-gpu-container
image: nvidia/cuda:11.0-runtime-ubuntu18.04
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 600; done;"]
resources:
limits:
nvidia.com/gpu: 1
kubectl apply -f my-gpu-pod.yaml
Help would be really appreciated as Ive spent quite some time on this now :)
Edit: Here is the running Pod and Node specifications (the node that was auto-scaled):
Name: my-gpu-pod
Namespace: default
Priority: 0
Node: <none>
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
my-gpu-container:
Image: nvidia/cuda:11.0-runtime-ubuntu18.04
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
--
Args:
while true; do sleep 600; done;
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9rvjz (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-9rvjz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9rvjz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 11m cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added):
Warning FailedScheduling 5m54s (x6 over 11m) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Warning FailedScheduling 54s (x7 over 5m37s) default-scheduler 0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
Name: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=n1-standard-1
beta.kubernetes.io/os=linux
cloud.google.com/gke-accelerator=nvidia-tesla-p100
cloud.google.com/gke-boot-disk=pd-standard
cloud.google.com/gke-nodepool=nap-n1-standard-1-gpu1-18jc7z9w
cloud.google.com/gke-os-distribution=cos
cloud.google.com/machine-family=n1
failure-domain.beta.kubernetes.io/region=europe-west4
failure-domain.beta.kubernetes.io/zone=europe-west4-a
kubernetes.io/arch=amd64
kubernetes.io/hostname=gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
kubernetes.io/os=linux
node.kubernetes.io/instance-type=n1-standard-1
topology.gke.io/zone=europe-west4-a
topology.kubernetes.io/region=europe-west4
topology.kubernetes.io/zone=europe-west4-a
Annotations: container.googleapis.com/instance_id: 7877226485154959129
csi.volume.kubernetes.io/nodeid:
{"pd.csi.storage.gke.io":"projects/exor-arctic/zones/europe-west4-a/instances/gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2"}
node.alpha.kubernetes.io/ttl: 0
node.gke.io/last-applied-node-labels:
cloud.google.com/gke-accelerator=nvidia-tesla-p100,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-nodepool=nap-n1-standar...
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 22 Mar 2021 11:32:17 +0100
Taints: nvidia.com/gpu=present:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
AcquireTime: <unset>
RenewTime: Mon, 22 Mar 2021 11:38:58 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
KernelDeadlock False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 KernelHasNoDeadlock kernel has no deadlock
ReadonlyFilesystem False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 FilesystemIsNotReadOnly Filesystem is not read-only
CorruptDockerOverlay2 False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoCorruptDockerOverlay2 docker overlay2 is functioning properly
FrequentUnregisterNetDevice False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentUnregisterNetDevice node is functioning properly
FrequentKubeletRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentKubeletRestart kubelet is functioning properly
FrequentDockerRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentDockerRestart docker is functioning properly
FrequentContainerdRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentContainerdRestart containerd is functioning properly
NetworkUnavailable False Mon, 22 Mar 2021 11:32:18 +0100 Mon, 22 Mar 2021 11:32:18 +0100 RouteCreated NodeController create implicit route
MemoryPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:19 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.164.0.16
ExternalIP: 35.204.55.105
InternalDNS: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2.c.exor-arctic.internal
Hostname: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2.c.exor-arctic.internal
Capacity:
attachable-volumes-gce-pd: 127
cpu: 1
ephemeral-storage: 98868448Ki
hugepages-2Mi: 0
memory: 3776196Ki
pods: 110
Allocatable:
attachable-volumes-gce-pd: 127
cpu: 940m
ephemeral-storage: 47093746742
hugepages-2Mi: 0
memory: 2690756Ki
pods: 110
System Info:
Machine ID: 307671eefc01914a7bfacf17a48e087e
System UUID: 307671ee-fc01-914a-7bfa-cf17a48e087e
Boot ID: acd58f3b-1659-494c-b83d-427f834d23a6
Kernel Version: 5.4.49+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.9
Kubelet Version: v1.18.12-gke.1210
Kube-Proxy Version: v1.18.12-gke.1210
PodCIDR: 10.100.1.0/24
PodCIDRs: 10.100.1.0/24
ProviderID: gce://exor-arctic/europe-west4-a/gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system fluentbit-gke-k22gv 100m (10%) 0 (0%) 200Mi (7%) 500Mi (19%) 6m46s
kube-system gke-metrics-agent-5fblx 3m (0%) 0 (0%) 50Mi (1%) 50Mi (1%) 6m47s
kube-system kube-proxy-gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 100m (10%) 0 (0%) 0 (0%) 0 (0%) 6m44s
kube-system nvidia-driver-installer-vmw8r 150m (15%) 0 (0%) 0 (0%) 0 (0%) 6m45s
kube-system nvidia-gpu-device-plugin-8vqsl 50m (5%) 50m (5%) 10Mi (0%) 10Mi (0%) 6m45s
kube-system pdcsi-node-k9brg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6m47s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 403m (42%) 50m (5%)
memory 260Mi (9%) 560Mi (21%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-gce-pd 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 6m47s kubelet Starting kubelet.
Normal NodeAllocatableEnforced 6m47s kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasSufficientPID
Normal NodeReady 6m45s kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeReady
Normal Starting 6m44s kube-proxy Starting kube-proxy.
Warning NodeSysctlChange 6m41s sysctl-monitor
Warning ContainerdStart 6m41s systemd-monitor Starting containerd container runtime...
Warning DockerStart 6m41s (x2 over 6m41s) systemd-monitor Starting Docker Application Container Engine...
Warning KubeletStart 6m41s systemd-monitor Started Kubernetes kubelet.
As per the Kubernetes Documentation https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#nvidia-gpu-device-plugin-used-by-gce, we are supposed to use https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/daemonset.yaml.
So can you run
kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/daemonset.yaml
A common error related with GKE is about project Quotas limiting resources, this could lead to the nodes not auto-provisioning or scaling up due to not being able to assign the resources.
Maybe your project Quotas for GPU (or specifically for nvidia-tesla-p100) are set to 0 or to a number way below to the requested one.
In this link there's more information about how to check it and how to request more resources for your quota.
Also, I see that you're making use of shared-core E2 instances, which are not compatible with accelerators. It shouldn't be an issue as GKE should automatically change the machine type to N1 if it detects the workload contains a GPU, as seen in this link, but still maybe attempt to run the cluster with other machine types such as N1.
You might be having a problem with the scopes.
When using node auto-provisioning with GPUs, the auto-provisioned node pools by default do not have sufficient scopes to run the installation DaemonSet. You need to manually change the default autoprovisioning scopes to enable that.
In this case the documented scopes that are required at the time of writting are:
[ "https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/compute"
]
this article mentions this very issue: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#using_node_auto-provisioning_with_gpus
you might just have to expand them and retry. Manually it works because you have the necessary scopes.

Minikube pod keep a forever pending status and failed to be scheduled

I am very begginer on kubernetes. Sorry if this a is dumb question.
I am using minikube and kvm2(5.0.0). Here is the info about minikube and kubectl version
Minikube status output
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
kubectl cluster-info output:
Kubernetes master is running at https://127.0.0.1:32768
KubeDNS is running at https://127.0.0.1:32768/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
I am trying to deploy a pod using kubectl apply -f client-pod.yaml. Here is my client-pod.yaml configuration
apiVersion: v1
kind: Pod
metadata:
name: client-pod
labels:
component: web
spec:
containers:
- name: client
image: stephengrider/multi-client
ports:
- containerPort: 3000
This is the kubectl get pods output:
NAME READY STATUS RESTARTS AGE
client-pod 0/1 Pending 0 4m15s
kubectl describe pods output:
Name: client-pod
Namespace: default
Priority: 0
Node: <none>
Labels: component=web
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"component":"web"},"name":"client-pod","namespace":"default"},"spec...
Status: Pending
IP:
IPs: <none>
Containers:
client:
Image: stephengrider/multi-client
Port: 3000/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-z45bq (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-z45bq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-z45bq
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
I have been searching for a way to see which taints is stopping to pod to initialize wihout luck.
Is there a way to see the taint that is failing?
kubectl get nodes output:
NAME STATUS ROLES AGE VERSION
m01 Ready master 11h v1.17.3
-- EDIT --
kubectl describe nodes output:
Name: home-pc
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=home-pc
kubernetes.io/os=linux
minikube.k8s.io/commit=eb13446e786c9ef70cb0a9f85a633194e62396a1
minikube.k8s.io/name=minikube
minikube.k8s.io/updated_at=2020_03_17T22_51_28_0700
minikube.k8s.io/version=v1.8.2
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 17 Mar 2020 22:51:25 -0500
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: home-pc
AcquireTime: <unset>
RenewTime: Tue, 17 Mar 2020 22:51:41 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:41 -0500 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.0.12
Hostname: home-pc
Capacity:
cpu: 12
ephemeral-storage: 227688908Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8159952Ki
pods: 110
Allocatable:
cpu: 12
ephemeral-storage: 209838097266
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8057552Ki
pods: 110
System Info:
Machine ID: 339d426453b4492da92f75d06acc1e0d
System UUID: 62eedb55-444f-61ce-75e9-b06ebf3331a0
Boot ID: a9ae9889-d7cb-48c5-ae75-b2052292ac7a
Kernel Version: 5.0.0-38-generic
OS Image: Ubuntu 19.04
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.5
Kubelet Version: v1.17.3
Kube-Proxy Version: v1.17.3
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-6955765f44-mbwqt 100m (0%) 0 (0%) 70Mi (0%) 170Mi (2%) 10s
kube-system coredns-6955765f44-sblf2 100m (0%) 0 (0%) 70Mi (0%) 170Mi (2%) 10s
kube-system etcd-home-pc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-apiserver-home-pc 250m (2%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-controller-manager-home-pc 200m (1%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-proxy-lk7xs 0 (0%) 0 (0%) 0 (0%) 0 (0%) 10s
kube-system kube-scheduler-home-pc 100m (0%) 0 (0%) 0 (0%) 0 (0%) 12s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (6%) 0 (0%)
memory 140Mi (1%) 340Mi (4%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 24s kubelet, home-pc Starting kubelet.
Normal NodeHasSufficientMemory 23s (x4 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 23s (x3 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 23s (x3 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 23s kubelet, home-pc Updated Node Allocatable limit across pods
Normal Starting 13s kubelet, home-pc Starting kubelet.
Normal NodeHasSufficientMemory 13s kubelet, home-pc Node home-pc status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 13s kubelet, home-pc Node home-pc status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 13s kubelet, home-pc Node home-pc status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 13s kubelet, home-pc Updated Node Allocatable limit across pods
Normal Starting 9s kube-proxy, home-pc Starting kube-proxy.
Normal NodeReady 3s kubelet, home-pc Node home-pc status is now: NodeReady
You have some taints on the node which is stopping the scheduler from deploying the pod.Either remove the taint from master node or add tolerations in the pod spec.

cluster-autoscaler and dns-controller continuously evicting

I have just terminated a AWS K8S node, and now.
K8S recreated a new one, and installed new pods. Everything seems good so far.
But when I do:
kubectl get po -A
I get:
kube-system cluster-autoscaler-648b4df947-42hxv 0/1 Evicted 0 3m53s
kube-system cluster-autoscaler-648b4df947-45pcc 0/1 Evicted 0 47m
kube-system cluster-autoscaler-648b4df947-46w6h 0/1 Evicted 0 91m
kube-system cluster-autoscaler-648b4df947-4tlbl 0/1 Evicted 0 69m
kube-system cluster-autoscaler-648b4df947-52295 0/1 Evicted 0 3m54s
kube-system cluster-autoscaler-648b4df947-55wzb 0/1 Evicted 0 83m
kube-system cluster-autoscaler-648b4df947-57kv5 0/1 Evicted 0 107m
kube-system cluster-autoscaler-648b4df947-69rsl 0/1 Evicted 0 98m
kube-system cluster-autoscaler-648b4df947-6msx2 0/1 Evicted 0 11m
kube-system cluster-autoscaler-648b4df947-6pphs 0 18m
kube-system dns-controller-697f6d9457-zswm8 0/1 Evicted 0 54m
When I do:
kubectl describe pod -n kube-system dns-controller-697f6d9457-zswm8
I get:
➜ monitoring git:(master) ✗ kubectl describe pod -n kube-system dns-controller-697f6d9457-zswm8
Name: dns-controller-697f6d9457-zswm8
Namespace: kube-system
Priority: 0
Node: ip-172-20-57-13.eu-west-3.compute.internal/
Start Time: Mon, 07 Oct 2019 12:35:06 +0200
Labels: k8s-addon=dns-controller.addons.k8s.io
k8s-app=dns-controller
pod-template-hash=697f6d9457
version=v1.12.0
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage. Container dns-controller was using 48Ki, which exceeds its request of 0.
IP:
IPs: <none>
Controlled By: ReplicaSet/dns-controller-697f6d9457
Containers:
dns-controller:
Image: kope/dns-controller:1.12.0
Port: <none>
Host Port: <none>
Command:
/usr/bin/dns-controller
--watch-ingress=false
--dns=aws-route53
--zone=*/ZDOYTALGJJXCM
--zone=*/*
-v=2
Requests:
cpu: 50m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from dns-controller-token-gvxxd (ro)
Volumes:
dns-controller-token-gvxxd:
Type: Secret (a volume populated by a Secret)
SecretName: dns-controller-token-gvxxd
Optional: false
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Evicted 59m kubelet, ip-172-20-57-13.eu-west-3.compute.internal The node was low on resource: ephemeral-storage. Container dns-controller was using 48Ki, which exceeds its request of 0.
Normal Killing 59m kubelet, ip-172-20-57-13.eu-west-3.compute.internal Killing container with id docker://dns-controller:Need to kill Pod
And:
➜ monitoring git:(master) ✗ kubectl describe pod -n kube-system cluster-autoscaler-648b4df947-2zcrz
Name: cluster-autoscaler-648b4df947-2zcrz
Namespace: kube-system
Priority: 0
Node: ip-172-20-57-13.eu-west-3.compute.internal/
Start Time: Mon, 07 Oct 2019 13:26:26 +0200
Labels: app=cluster-autoscaler
k8s-addon=cluster-autoscaler.addons.k8s.io
pod-template-hash=648b4df947
Annotations: prometheus.io/port: 8085
prometheus.io/scrape: true
scheduler.alpha.kubernetes.io/tolerations: [{"key":"dedicated", "value":"master"}]
Status: Failed
Reason: Evicted
Message: Pod The node was low on resource: [DiskPressure].
IP:
IPs: <none>
Controlled By: ReplicaSet/cluster-autoscaler-648b4df947
Containers:
cluster-autoscaler:
Image: gcr.io/google-containers/cluster-autoscaler:v1.15.1
Port: <none>
Host Port: <none>
Command:
./cluster-autoscaler
--v=4
--stderrthreshold=info
--cloud-provider=aws
--skip-nodes-with-local-storage=false
--nodes=0:1:pamela-nodes.k8s-prod.sunchain.fr
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 300Mi
Liveness: http-get http://:8085/health-check delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8085/health-check delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
AWS_REGION: eu-west-3
Mounts:
/etc/ssl/certs/ca-certificates.crt from ssl-certs (ro)
/var/run/secrets/kubernetes.io/serviceaccount from cluster-autoscaler-token-hld2m (ro)
Volumes:
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs/ca-certificates.crt
HostPathType:
cluster-autoscaler-token-hld2m:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-autoscaler-token-hld2m
Optional: false
QoS Class: Guaranteed
Node-Selectors: kubernetes.io/role=master
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned kube-system/cluster-autoscaler-648b4df947-2zcrz to ip-172-20-57-13.eu-west-3.compute.internal
Warning Evicted 11m kubelet, ip-172-20-57-13.eu-west-3.compute.internal The node was low on resource: [DiskPressure].
It seems to be a ressource issue. Weird thing is before I killed my EC2 instance, I didn t have this issue.
Why is it happening and what should I do? Is it mandatory to add more ressources ?
➜ scripts kubectl describe node ip-172-20-57-13.eu-west-3.compute.internal
Name: ip-172-20-57-13.eu-west-3.compute.internal
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-west-3
failure-domain.beta.kubernetes.io/zone=eu-west-3a
kops.k8s.io/instancegroup=master-eu-west-3a
kubernetes.io/hostname=ip-172-20-57-13.eu-west-3.compute.internal
kubernetes.io/role=master
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 28 Aug 2019 09:38:09 +0200
Taints: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 28 Aug 2019 09:38:36 +0200 Wed, 28 Aug 2019 09:38:36 +0200 RouteCreated RouteController created a route
OutOfDisk False Mon, 07 Oct 2019 14:14:32 +0200 Wed, 28 Aug 2019 09:38:09 +0200 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Mon, 07 Oct 2019 14:14:32 +0200 Wed, 28 Aug 2019 09:38:09 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Mon, 07 Oct 2019 14:14:32 +0200 Mon, 07 Oct 2019 14:11:02 +0200 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Mon, 07 Oct 2019 14:14:32 +0200 Wed, 28 Aug 2019 09:38:09 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 07 Oct 2019 14:14:32 +0200 Wed, 28 Aug 2019 09:38:35 +0200 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 172.20.57.13
ExternalIP: 35.180.187.101
InternalDNS: ip-172-20-57-13.eu-west-3.compute.internal
Hostname: ip-172-20-57-13.eu-west-3.compute.internal
ExternalDNS: ec2-35-180-187-101.eu-west-3.compute.amazonaws.com
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 7797156Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2013540Ki
pods: 110
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 7185858958
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1911140Ki
pods: 110
System Info:
Machine ID: ec2b3aa5df0e3ad288d210f309565f06
System UUID: EC2B3AA5-DF0E-3AD2-88D2-10F309565F06
Boot ID: f9d5417b-eba9-4544-9710-a25d01247b46
Kernel Version: 4.9.0-9-amd64
OS Image: Debian GNU/Linux 9 (stretch)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.3
Kubelet Version: v1.12.10
Kube-Proxy Version: v1.12.10
PodCIDR: 100.96.1.0/24
ProviderID: aws:///eu-west-3a/i-03bf1b26313679d65
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system etcd-manager-events-ip-172-20-57-13.eu-west-3.compute.internal 100m (5%) 0 (0%) 100Mi (5%) 0 (0%) 40d
kube-system etcd-manager-main-ip-172-20-57-13.eu-west-3.compute.internal 200m (10%) 0 (0%) 100Mi (5%) 0 (0%) 40d
kube-system kube-apiserver-ip-172-20-57-13.eu-west-3.compute.internal 150m (7%) 0 (0%) 0 (0%) 0 (0%) 40d
kube-system kube-controller-manager-ip-172-20-57-13.eu-west-3.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%) 40d
kube-system kube-proxy-ip-172-20-57-13.eu-west-3.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%) 40d
kube-system kube-scheduler-ip-172-20-57-13.eu-west-3.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%) 40d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (37%) 0 (0%)
memory 200Mi (10%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasNoDiskPressure 55m (x324 over 40d) kubelet, ip-172-20-57-13.eu-west-3.compute.internal Node ip-172-20-57-13.eu-west-3.compute.internal status is now: NodeHasNoDiskPressure
Warning EvictionThresholdMet 10m (x1809 over 16d) kubelet, ip-172-20-57-13.eu-west-3.compute.internal Attempting to reclaim ephemeral-storage
Warning ImageGCFailed 4m30s (x6003 over 23d) kubelet, ip-172-20-57-13.eu-west-3.compute.internal (combined from similar events): wanted to free 652348620 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete dd37681076e1 (cannot be forced) - image is being used by running container b1800146af29
I think a better command to debug it is:
devops git:(master) ✗ kubectl get events --sort-by=.metadata.creationTimestamp -o wide
LAST SEEN TYPE REASON KIND SOURCE MESSAGE SUBOBJECT FIRST SEEN COUNT NAME
10m Warning ImageGCFailed Node kubelet, ip-172-20-57-13.eu-west-3.compute.internal (combined from similar events): wanted to free 653307084 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete dd37681076e1 (cannot be forced) - image is being used by running container b1800146af29 23d 6004 ip-172-20-57-13.eu-west-3.compute.internal.15c4124e15eb1d33
2m59s Warning ImageGCFailed Node kubelet, ip-172-20-36-135.eu-west-3.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 639524044 bytes, but freed 0 bytes 7d9h 2089 ip-172-20-36-135.eu-west-3.compute.internal.15c916d24afe2c25
4m59s Warning ImageGCFailed Node kubelet, ip-172-20-33-81.eu-west-3.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 458296524 bytes, but freed 0 bytes 4d14h 1183 ip-172-20-33-81.eu-west-3.compute.internal.15c9f3fe4e1525ec
6m43s Warning EvictionThresholdMet Node kubelet, ip-172-20-57-13.eu-west-3.compute.internal Attempting to reclaim ephemeral-storage 16d 1841 ip-172-20-57-13.eu-west-3.compute.internal.15c66e349b761219
41s Normal NodeHasNoDiskPressure Node kubelet, ip-172-20-57-13.eu-west-3.compute.internal Node ip-172-20-57-13.eu-west-3.compute.internal status is now: NodeHasNoDiskPressure 40d 333 ip-172-20-57-13.eu-west-3.compute.internal.15bf05cec37981b6
Now df -h
admin#ip-172-20-57-13:/var/log$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 972M 0 972M 0% /dev
tmpfs 197M 2.3M 195M 2% /run
/dev/nvme0n1p2 7.5G 6.4G 707M 91% /
tmpfs 984M 0 984M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 984M 0 984M 0% /sys/fs/cgroup
/dev/nvme1n1 20G 430M 20G 3% /mnt/master-vol-09618123eb79d92c8
/dev/nvme2n1 20G 229M 20G 2% /mnt/master-vol-05c9684f0edcbd876
It looks like your nodes/master is running low on storage? I see only 1GB for ephemeral storage available.
You should free up some space on the node and master. It should get rid of your problem.

PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"

I am setting up a kubernetes lab using one node only and learning to setup kubernetes nfs.
I am following kubernetes nfs example step by step from the following link:
https://github.com/kubernetes/examples/tree/master/staging/volumes/nfs
Trying the first section, NFS server part, executed 3 commands:
$ kubectl create -f examples/volumes/nfs/provisioner/nfs-server-gce-pv.yaml
$ kubectl create -f examples/volumes/nfs/nfs-server-rc.yaml
$ kubectl create -f examples/volumes/nfs/nfs-server-service.yaml
I experience problem, where I see the following event:
PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
Research done:
https://github.com/kubernetes/kubernetes/issues/43120
https://github.com/kubernetes/examples/pull/30
None of those links above help me to resolve issue I experience.
I have made sure it is using image 0.8.
Image: gcr.io/google_containers/volume-nfs:0.8
Does anyone know what does this message mean?
Clue and guidance on how to troubleshoot this issue is very much appreciated.
Thank you.
$ docker version
Client:
Version: 17.09.0-ce
API version: 1.32
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:41:23 2017
OS/Arch: linux/amd64
Server:
Version: 17.09.0-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:42:49 2017
OS/Arch: linux/amd64
Experimental: false
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:39:33Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
lab-kube-06 Ready master 2m v1.8.3
$ kubectl describe nodes lab-kube-06
Name: lab-kube-06
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=lab-kube-06
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Thu, 16 Nov 2017 16:51:28 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Thu, 16 Nov 2017 17:30:36 +0000 Thu, 16 Nov 2017 16:51:28 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Thu, 16 Nov 2017 17:30:36 +0000 Thu, 16 Nov 2017 16:51:28 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 16 Nov 2017 17:30:36 +0000 Thu, 16 Nov 2017 16:51:28 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Thu, 16 Nov 2017 17:30:36 +0000 Thu, 16 Nov 2017 16:51:28 +0000 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.0.6
Hostname: lab-kube-06
Capacity:
cpu: 2
memory: 8159076Ki
pods: 110
Allocatable:
cpu: 2
memory: 8056676Ki
pods: 110
System Info:
Machine ID: e198b57826ab4704a6526baea5fa1d06
System UUID: 05EF54CC-E8C8-874B-A708-BBC7BC140FF2
Boot ID: 3d64ad16-5603-42e9-bd34-84f6069ded5f
Kernel Version: 3.10.0-693.el7.x86_64
OS Image: Red Hat Enterprise Linux Server 7.4 (Maipo)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://Unknown
Kubelet Version: v1.8.3
Kube-Proxy Version: v1.8.3
ExternalID: lab-kube-06
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system etcd-lab-kube-06 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-lab-kube-06 250m (12%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-lab-kube-06 200m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-dns-545bc4bfd4-gmdvn 260m (13%) 0 (0%) 110Mi (1%) 170Mi (2%)
kube-system kube-proxy-68w8k 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-lab-kube-06 100m (5%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-7zlbg 20m (1%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
830m (41%) 0 (0%) 110Mi (1%) 170Mi (2%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 39m kubelet, lab-kube-06 Starting kubelet.
Normal NodeAllocatableEnforced 39m kubelet, lab-kube-06 Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 39m (x8 over 39m) kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 39m (x8 over 39m) kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 39m (x7 over 39m) kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasNoDiskPressure
Normal Starting 38m kube-proxy, lab-kube-06 Starting kube-proxy.
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pv-provisioning-demo Pending 14s
$ kubectl get events
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
18m 18m 1 lab-kube-06.14f79f093119829a Node Normal Starting kubelet, lab-kube-06 Starting kubelet.
18m 18m 8 lab-kube-06.14f79f0931d0eb6e Node Normal NodeHasSufficientDisk kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasSufficientDisk
18m 18m 8 lab-kube-06.14f79f0931d1253e Node Normal NodeHasSufficientMemory kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasSufficientMemory
18m 18m 7 lab-kube-06.14f79f0931d131be Node Normal NodeHasNoDiskPressure kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasNoDiskPressure
18m 18m 1 lab-kube-06.14f79f0932f3f1b0 Node Normal NodeAllocatableEnforced kubelet, lab-kube-06 Updated Node Allocatable limit across pods
18m 18m 1 lab-kube-06.14f79f122a32282d Node Normal RegisteredNode controllermanager Node lab-kube-06 event: Registered Node lab-kube-06 in Controller
17m 17m 1 lab-kube-06.14f79f1cdfc4c3b1 Node Normal Starting kube-proxy, lab-kube-06 Starting kube-proxy.
17m 17m 1 lab-kube-06.14f79f1d94ef1c17 Node Normal RegisteredNode controllermanager Node lab-kube-06 event: Registered Node lab-kube-06 in Controller
14m 14m 1 lab-kube-06.14f79f4b91cf73b3 Node Normal RegisteredNode controllermanager Node lab-kube-06 event: Registered Node lab-kube-06 in Controller
58s 11m 42 nfs-pv-provisioning-demo.14f79f766cf887f2 PersistentVolumeClaim Normal FailedBinding persistentvolume-controller no persistent volumes available for this claim and no storage class is set
14s 4m 20 nfs-server-kq44h.14f79fd21b9db5f9 Pod Warning FailedScheduling default-scheduler PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
4m 4m 1 nfs-server.14f79fd21b946027 ReplicationController Normal SuccessfulCreate replication-controller Created pod: nfs-server-kq44h
2m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-kq44h 0/1 Pending 0 16s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-kq44h 0/1 Pending 0 26s
$ kubectl get rc
NAME DESIRED CURRENT READY AGE
nfs-server 1 1 0 40s
$ kubectl describe pods nfs-server-kq44h
Name: nfs-server-kq44h
Namespace: default
Node: <none>
Labels: role=nfs-server
Annotations: kubernetes.io/created-
by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-server","uid":"5653eb53-caf0-11e7-ac02-000d3a04eb...
Status: Pending
IP:
Created By: ReplicationController/nfs-server
Controlled By: ReplicationController/nfs-server
Containers:
nfs-server:
Image: gcr.io/google_containers/volume-nfs:0.8
Ports: 2049/TCP, 20048/TCP, 111/TCP
Environment: <none>
Mounts:
/exports from mypvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-plgv5 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
mypvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-pv-provisioning-demo
ReadOnly: false
default-token-plgv5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-plgv5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 39s (x22 over 5m) default-scheduler PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
Each Persistent Volume Claim (PVC) needs a Persistent Volume (PV) that it can bind to. In your example, you have only created a PVC, but not the volume itself.
A PV can either be created manually, or automatically by using a Volume class with a provisioner. Have a look at the docs of static and dynamic provisioning for more information):
There are two ways PVs may be provisioned: statically or dynamically.
Static
A cluster administrator creates a number of PVs. They carry the details of the real storage which is available for use by cluster users. [...]
Dynamic
When none of the static PVs the administrator created matches a user’s PersistentVolumeClaim, the cluster may try to dynamically provision a volume specially for the PVC. This provisioning is based on StorageClasses: the PVC must request a class and the administrator must have created and configured that class in order for dynamic provisioning to occur.
In your example, you are creating a storage class provisioner (defined in examples/volumes/nfs/provisioner/nfs-server-gce-pv.yaml) that seems to be tailored for usage within the Google cloud (which it will probably not be able to actually create PVs in your lab setup).
You can create a persistent volume manually on your own. After creating the PV, the PVC should automatically bind itself to the volume and your pods should start. Below is an example for a persistent volume that uses the node's local file system as a volume (which is probably OK for a one-node test setup):
apiVersion: v1
kind: PersistentVolume
metadata:
name: someVolume
spec:
capacity:
storage: 200Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /path/on/host
For a production setup, you'll probably want to choose a different volume type at hostPath, although the volume types available to you will greatly differ depending on the environment that you're in (cloud or self-hosted/bare-metal).

kubernetes cluster master node not ready

i do not know why ,my master node in not ready status,all pods on cluster run normally, and i use cabernets v1.7.5 ,and network plugin use calico,and os version is "centos7.2.1511"
# kubectl get nodes
NAME STATUS AGE VERSION
k8s-node1 Ready 1h v1.7.5
k8s-node2 NotReady 1h v1.7.5
# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system po/calico-node-11kvm 2/2 Running 0 33m
kube-system po/calico-policy-controller-1906845835-1nqjj 1/1 Running 0 33m
kube-system po/calicoctl 1/1 Running 0 33m
kube-system po/etcd-k8s-node2 1/1 Running 1 15m
kube-system po/kube-apiserver-k8s-node2 1/1 Running 1 15m
kube-system po/kube-controller-manager-k8s-node2 1/1 Running 2 15m
kube-system po/kube-dns-2425271678-2mh46 3/3 Running 0 1h
kube-system po/kube-proxy-qlmbx 1/1 Running 1 1h
kube-system po/kube-proxy-vwh6l 1/1 Running 0 1h
kube-system po/kube-scheduler-k8s-node2 1/1 Running 2 15m
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/kubernetes 10.96.0.1 <none> 443/TCP 1h
kube-system svc/kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 1h
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-system deploy/calico-policy-controller 1 1 1 1 33m
kube-system deploy/kube-dns 1 1 1 1 1h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system rs/calico-policy-controller-1906845835 1 1 1 33m
kube-system rs/kube-dns-2425271678 1 1 1 1h
update
it seems master node can not recognize the calico network plugin, i use kubeadm to install k8s cluster ,due to kubeadm start etcd on 127.0.0.1:2379 on master node,and calico on other nodes can not talk with etcd,so i modify etcd.yaml as following ,and all calico pods run fine, i do not very familiar with calico ,how to fix it ?
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --listen-client-urls=http://127.0.0.1:2379,http://10.161.233.80:2379
- --advertise-client-urls=http://10.161.233.80:2379
- --data-dir=/var/lib/etcd
image: gcr.io/google_containers/etcd-amd64:3.0.17
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health
port: 2379
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: etcd
resources: {}
volumeMounts:
- mountPath: /etc/ssl/certs
name: certs
- mountPath: /var/lib/etcd
name: etcd
- mountPath: /etc/kubernetes
name: k8s
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/ssl/certs
name: certs
- hostPath:
path: /var/lib/etcd
name: etcd
- hostPath:
path: /etc/kubernetes
name: k8s
status: {}
[root#k8s-node2 calico]# kubectl describe node k8s-node2
Name: k8s-node2
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=k8s-node2
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: node-role.kubernetes.io/master:NoSchedule
CreationTimestamp: Tue, 12 Sep 2017 15:20:57 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Wed, 13 Sep 2017 10:25:58 +0800 Tue, 12 Sep 2017 15:20:57 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 13 Sep 2017 10:25:58 +0800 Tue, 12 Sep 2017 15:20:57 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 13 Sep 2017 10:25:58 +0800 Tue, 12 Sep 2017 15:20:57 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready False Wed, 13 Sep 2017 10:25:58 +0800 Tue, 12 Sep 2017 15:20:57 +0800 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
InternalIP: 10.161.233.80
Hostname: k8s-node2
Capacity:
cpu: 2
memory: 3618520Ki
pods: 110
Allocatable:
cpu: 2
memory: 3516120Ki
pods: 110
System Info:
Machine ID: 3c6ff97c6fbe4598b53fd04e08937468
System UUID: C6238BF8-8E60-4331-AEEA-6D0BA9106344
Boot ID: 84397607-908f-4ff8-8bdc-ff86c364dd32
Kernel Version: 3.10.0-514.6.2.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.7.5
Kube-Proxy Version: v1.7.5
PodCIDR: 10.68.0.0/24
ExternalID: k8s-node2
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system etcd-k8s-node2 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-k8s-node2 250m (12%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-k8s-node2 200m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-qlmbx 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-k8s-node2 100m (5%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
550m (27%) 0 (0%) 0 (0%) 0 (0%)
Events: <none>
It's good practice to run a describe command in order to see what's wrong with your node:
kubectl describe nodes <NODE_NAME>
e.g.: kubectl describe nodes k8s-node2
You should be able to start your investigations from there and add more info to this question if needed.
You need install a Network Policy Provider, this is one of supported provider:
Weave Net for NetworkPolicy.
command line to install:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
After a few seconds, a Weave Net pod should be running on each Node and any further pods you create will be automatically attached to the Weave network.
I think you may need to add tolerations and update the annotations for calico-node in the manifest you are using so that it can run on a master created by kubeadm. Kubeadm taints the master so that pods cannot run on it unless they have a toleration for that taint.
I believe you are using the https://docs.projectcalico.org/v2.5/getting-started/kubernetes/installation/hosted/calico.yaml manifest which has the annotations (that include tolerations) for K8s v1.5, you should check https://docs.projectcalico.org/v2.5/getting-started/kubernetes/installation/hosted/kubeadm/1.6/calico.yaml, it has the toleration syntax for K8s v1.6+.
Here is a snippet from the above with annotations and tolerations
metadata:
labels:
k8s-app: calico-node
annotations:
# Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
# reserves resources for critical add-on pods so that they can be rescheduled after
# a failure. This annotation works in tandem with the toleration below.
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
hostNetwork: true
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
# Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
# This, along with the annotation above marks this pod as a critical add-on.
- key: CriticalAddonsOnly
operator: Exists