Kubernetes kubectl shows pods restarts as zero but pods age has changed

Kubernetes kubectl shows pods restarts as zero but pods age has changed - kubernetes

Can somebody explain why the following command shows that there have been no restarts but the age is 2 hours when it was started 17 days ago
kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
api-depl-nm-xxx 1/1 Running 0 17d xxx.xxx.xxx.xxx ip-xxx-xxx-xxx-xxx.eu-west-1.compute.internal
ei-depl-nm-xxx 1/1 Running 0 2h xxx.xxx.xxx.xxx ip-xxx-xxx-xxx-xxx.eu-west-1.compute.internal
jenkins-depl-nm-xxx 1/1 Running 0 2h xxx.xxx.xxx.xxx ip-xxx-xxx-xxx-xxx.eu-west-1.compute.internal
The deployments have been running for 17 days:
kubectl get deploy -o wide
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINER(S) IMAGE(S) SELECTOR
api-depl-nm 1 1 1 1 17d api-depl-nm xxx name=api-depl-nm
ei-depl-nm 1 1 1 1 17d ei-depl-nm xxx name=ei-depl-nm
jenkins-depl-nm 1 1 1 1 17d jenkins-depl-nm xxx name=jenkins-depl-nm
The start time was 2 hours ago:
kubectl describe po ei-depl-nm-xxx | grep Start
Start Time: Tue, 24 Jul 2018 09:07:05 +0100
Started: Tue, 24 Jul 2018 09:10:33 +0100
The application logs show it restarted.
So why is the restarts 0?
Updated with more information as a response to answer.
I may be wrong but I don't think the deployment was updated or scaled it certainly was not done be me and no one else has access to the system.
kubectl describe deployment ei-depl-nm
...
CreationTimestamp: Fri, 06 Jul 2018 17:06:24 +0100
Labels: name=ei-depl-nm
...
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
...
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: ei-depl-nm-xxx (1/1 replicas created)
Events: <none>
I may be wrong but I don't think the worker node was restarted or shut down
kubectl describe nodes ip-xxx.eu-west-1.compute.internal
Taints: <none>
CreationTimestamp: Fri, 06 Jul 2018 16:39:40 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 06 Jul 2018 16:39:45 +0100 Fri, 06 Jul 2018 16:39:45 +0100 RouteCreated RouteController created a route
OutOfDisk False Wed, 25 Jul 2018 16:30:36 +0100 Fri, 06 Jul 2018 16:39:40 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 25 Jul 2018 16:30:36 +0100 Wed, 25 Jul 2018 02:23:01 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 25 Jul 2018 16:30:36 +0100 Wed, 25 Jul 2018 02:23:01 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Wed, 25 Jul 2018 16:30:36 +0100 Wed, 25 Jul 2018 02:23:11 +0100 KubeletReady kubelet is posting ready status
......
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default ei-depl-nm-xxx 100m (5%) 0 (0%) 0 (0%) 0 (0%)
default jenkins-depl-nm-xxx 100m (5%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-dns-xxx 260m (13%) 0 (0%) 110Mi (1%) 170Mi (2%)
kube-system kube-proxy-ip-xxx.eu-west-1.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
560m (28%) 0 (0%) 110Mi (1%) 170Mi (2%)
Events: <none>

There are two things that might happen:
The deployment was updated or scaled:
age of deployment does not change
new ReplicaSet is created, old ReplicaSet is deleted. You can check it by running
$ kubectl describe deployment <deployment_name>
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 1m deployment-controller Scaled up replica set testdep1-75488876f6 to 1
Normal ScalingReplicaSet 1m deployment-controller Scaled down replica set testdep1-d4884df5f to 0
pods created by old ReplicaSet are terminated, new ReplicaSet created brand new pod with restarts 0 and age 0 sec.
Worker node was restarted or shut down.
Pod on old worker node disappears
Scheduler creates a brand new pod on the first available node (it can be the same node after reboot) with restarts 0 and age 0 sec.
You can check the node start events by running
kubectl describe nodes <node_name>
...
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 32s kubelet, <node-name> Starting kubelet.
Normal NodeHasSufficientPID 31s (x5 over 32s) kubelet, <node-name> Node <node-name> status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 31s kubelet, <node-name> Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 30s (x6 over 32s) kubelet, <node-name> Node <node-name> status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 30s (x6 over 32s) kubelet, <node-name> Node <node-name> status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 30s (x6 over 32s) kubelet, <node-name> Node <node-name> status is now: NodeHasNoDiskPressure
Normal Starting 10s kube-proxy, <node-name> Starting kube-proxy.

Related

Using GPU with Kubernetes GKE and Node auto-provisioning

I try to do something fairly simple: To run a GPU machine in a k8s cluster using auto-provisioning. When deploying the Pod with a limits: nvidia.com/gpu specification the auto-provisioning is correctly creating a node-pool and scaling up an appropriate node. However, the Pod stays at Pending with the following message:
Warning FailedScheduling 59s (x5 over 2m46s) default-scheduler 0/10 nodes are available: 10 Insufficient nvidia.com/gpu.
It seems like taints and tolerations are added correctly by gke. It just doesnt scale up.
Ive followed the instructions here:
https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers
To reproduce:
Create a new cluster in a zone with auto-provisioning that includes gpu (I have replaced my own project name with MYPROJECT). This command is what comes out of the console when these changes are done:
gcloud beta container --project "MYPROJECT" clusters create "cluster-2" --zone "europe-west4-a" --no-enable-basic-auth --cluster-version "1.18.12-gke.1210" --release-channel "regular" --machine-type "e2-medium" --image-type "COS" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes "1" --enable-stackdriver-kubernetes --enable-ip-alias --network "projects/MYPROJECT/global/networks/default" --subnetwork "projects/MYPROJECT/regions/europe-west4/subnetworks/default" --default-max-pods-per-node "110" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-autoprovisioning --min-cpu 1 --max-cpu 20 --min-memory 1 --max-memory 50 --max-accelerator type="nvidia-tesla-p100",count=1 --enable-autoprovisioning-autorepair --enable-autoprovisioning-autoupgrade --autoprovisioning-max-surge-upgrade 1 --autoprovisioning-max-unavailable-upgrade 0 --enable-vertical-pod-autoscaling --enable-shielded-nodes --node-locations "europe-west4-a"
Install NVIDIA drivers by installing DaemonSet:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
Deploy pod that requests GPU:
my-gpu-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: my-gpu-pod
spec:
containers:
- name: my-gpu-container
image: nvidia/cuda:11.0-runtime-ubuntu18.04
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 600; done;"]
resources:
limits:
nvidia.com/gpu: 1
kubectl apply -f my-gpu-pod.yaml
Help would be really appreciated as Ive spent quite some time on this now :)
Edit: Here is the running Pod and Node specifications (the node that was auto-scaled):
Name: my-gpu-pod
Namespace: default
Priority: 0
Node: <none>
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
my-gpu-container:
Image: nvidia/cuda:11.0-runtime-ubuntu18.04
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
--
Args:
while true; do sleep 600; done;
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9rvjz (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-9rvjz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9rvjz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 11m cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added):
Warning FailedScheduling 5m54s (x6 over 11m) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Warning FailedScheduling 54s (x7 over 5m37s) default-scheduler 0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
Name: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=n1-standard-1
beta.kubernetes.io/os=linux
cloud.google.com/gke-accelerator=nvidia-tesla-p100
cloud.google.com/gke-boot-disk=pd-standard
cloud.google.com/gke-nodepool=nap-n1-standard-1-gpu1-18jc7z9w
cloud.google.com/gke-os-distribution=cos
cloud.google.com/machine-family=n1
failure-domain.beta.kubernetes.io/region=europe-west4
failure-domain.beta.kubernetes.io/zone=europe-west4-a
kubernetes.io/arch=amd64
kubernetes.io/hostname=gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
kubernetes.io/os=linux
node.kubernetes.io/instance-type=n1-standard-1
topology.gke.io/zone=europe-west4-a
topology.kubernetes.io/region=europe-west4
topology.kubernetes.io/zone=europe-west4-a
Annotations: container.googleapis.com/instance_id: 7877226485154959129
csi.volume.kubernetes.io/nodeid:
{"pd.csi.storage.gke.io":"projects/exor-arctic/zones/europe-west4-a/instances/gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2"}
node.alpha.kubernetes.io/ttl: 0
node.gke.io/last-applied-node-labels:
cloud.google.com/gke-accelerator=nvidia-tesla-p100,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-nodepool=nap-n1-standar...
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 22 Mar 2021 11:32:17 +0100
Taints: nvidia.com/gpu=present:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
AcquireTime: <unset>
RenewTime: Mon, 22 Mar 2021 11:38:58 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
KernelDeadlock False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 KernelHasNoDeadlock kernel has no deadlock
ReadonlyFilesystem False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 FilesystemIsNotReadOnly Filesystem is not read-only
CorruptDockerOverlay2 False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoCorruptDockerOverlay2 docker overlay2 is functioning properly
FrequentUnregisterNetDevice False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentUnregisterNetDevice node is functioning properly
FrequentKubeletRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentKubeletRestart kubelet is functioning properly
FrequentDockerRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentDockerRestart docker is functioning properly
FrequentContainerdRestart False Mon, 22 Mar 2021 11:37:25 +0100 Mon, 22 Mar 2021 11:32:23 +0100 NoFrequentContainerdRestart containerd is functioning properly
NetworkUnavailable False Mon, 22 Mar 2021 11:32:18 +0100 Mon, 22 Mar 2021 11:32:18 +0100 RouteCreated NodeController create implicit route
MemoryPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:17 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 22 Mar 2021 11:37:49 +0100 Mon, 22 Mar 2021 11:32:19 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.164.0.16
ExternalIP: 35.204.55.105
InternalDNS: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2.c.exor-arctic.internal
Hostname: gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2.c.exor-arctic.internal
Capacity:
attachable-volumes-gce-pd: 127
cpu: 1
ephemeral-storage: 98868448Ki
hugepages-2Mi: 0
memory: 3776196Ki
pods: 110
Allocatable:
attachable-volumes-gce-pd: 127
cpu: 940m
ephemeral-storage: 47093746742
hugepages-2Mi: 0
memory: 2690756Ki
pods: 110
System Info:
Machine ID: 307671eefc01914a7bfacf17a48e087e
System UUID: 307671ee-fc01-914a-7bfa-cf17a48e087e
Boot ID: acd58f3b-1659-494c-b83d-427f834d23a6
Kernel Version: 5.4.49+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.9
Kubelet Version: v1.18.12-gke.1210
Kube-Proxy Version: v1.18.12-gke.1210
PodCIDR: 10.100.1.0/24
PodCIDRs: 10.100.1.0/24
ProviderID: gce://exor-arctic/europe-west4-a/gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system fluentbit-gke-k22gv 100m (10%) 0 (0%) 200Mi (7%) 500Mi (19%) 6m46s
kube-system gke-metrics-agent-5fblx 3m (0%) 0 (0%) 50Mi (1%) 50Mi (1%) 6m47s
kube-system kube-proxy-gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 100m (10%) 0 (0%) 0 (0%) 0 (0%) 6m44s
kube-system nvidia-driver-installer-vmw8r 150m (15%) 0 (0%) 0 (0%) 0 (0%) 6m45s
kube-system nvidia-gpu-device-plugin-8vqsl 50m (5%) 50m (5%) 10Mi (0%) 10Mi (0%) 6m45s
kube-system pdcsi-node-k9brg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6m47s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 403m (42%) 50m (5%)
memory 260Mi (9%) 560Mi (21%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-gce-pd 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 6m47s kubelet Starting kubelet.
Normal NodeAllocatableEnforced 6m47s kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 6m46s (x4 over 6m47s) kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeHasSufficientPID
Normal NodeReady 6m45s kubelet Node gke-cluster-1-nap-n1-standard-1-gpu1--39fe3143-s8x2 status is now: NodeReady
Normal Starting 6m44s kube-proxy Starting kube-proxy.
Warning NodeSysctlChange 6m41s sysctl-monitor
Warning ContainerdStart 6m41s systemd-monitor Starting containerd container runtime...
Warning DockerStart 6m41s (x2 over 6m41s) systemd-monitor Starting Docker Application Container Engine...
Warning KubeletStart 6m41s systemd-monitor Started Kubernetes kubelet.

As per the Kubernetes Documentation https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#nvidia-gpu-device-plugin-used-by-gce, we are supposed to use https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/daemonset.yaml.
So can you run
kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/daemonset.yaml

A common error related with GKE is about project Quotas limiting resources, this could lead to the nodes not auto-provisioning or scaling up due to not being able to assign the resources.
Maybe your project Quotas for GPU (or specifically for nvidia-tesla-p100) are set to 0 or to a number way below to the requested one.
In this link there's more information about how to check it and how to request more resources for your quota.
Also, I see that you're making use of shared-core E2 instances, which are not compatible with accelerators. It shouldn't be an issue as GKE should automatically change the machine type to N1 if it detects the workload contains a GPU, as seen in this link, but still maybe attempt to run the cluster with other machine types such as N1.

You might be having a problem with the scopes.
When using node auto-provisioning with GPUs, the auto-provisioned node pools by default do not have sufficient scopes to run the installation DaemonSet. You need to manually change the default autoprovisioning scopes to enable that.
In this case the documented scopes that are required at the time of writting are:
[ "https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/compute"
]
this article mentions this very issue: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#using_node_auto-provisioning_with_gpus
you might just have to expand them and retry. Manually it works because you have the necessary scopes.

There is no ephemeral-storage resource on worker node of kubernetes

I tried to set worker node of kubernetes on board using arm64 arch.
This worker node was not changed to Ready status from NotReady status.
I checked Conditions log using below command:
$ kubectl describe nodes
...
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 02 Dec 2020 14:37:46 +0900 Wed, 02 Dec 2020 14:34:35 +0900 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 02 Dec 2020 14:37:46 +0900 Wed, 02 Dec 2020 14:34:35 +0900 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 02 Dec 2020 14:37:46 +0900 Wed, 02 Dec 2020 14:34:35 +0900 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 02 Dec 2020 14:37:46 +0900 Wed, 02 Dec 2020 14:34:35 +0900 KubeletNotReady [container runtime status check may not have completed yet, runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, missing node capacity for resources: ephemeral-storage]
...
Capacity:
cpu: 8
memory: 7770600Ki
pods: 110
Allocatable:
cpu: 8
memory: 7668200Ki
pods: 110
...
This worker node seems not have ephemeral-storage resource, so
this log seems be created
"[container runtime status check may not have completed yet, runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, missing node capacity for resources: ephemeral-storage]"
but root filesystem is mounted on / like follows,
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 23602256 6617628 15945856 30% /
devtmpfs 3634432 0 3634432 0% /dev
tmpfs 3885312 0 3885312 0% /dev/shm
tmpfs 3885312 100256 3785056 3% /run
tmpfs 3885312 0 3885312 0% /sys/fs/cgroup
tmpfs 524288 25476 498812 5% /tmp
tmpfs 524288 212 524076 1% /var/volatile
tmpfs 777060 0 777060 0% /run/user/1000
/dev/sde4 122816 49088 73728 40% /firmware
/dev/sde5 65488 608 64880 1% /bt_firmware
/dev/sde7 28144 20048 7444 73% /dsp
How can detect ephemeral-storage resource on worker node of kubernetes?
=======================================================================
I added full log of $ kubectl get nodes and $ kubectl describe nodes
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
raas-linux Ready master 6m25s v1.19.4
robot-dd9f6aaa NotReady <none> 5m16s v1.16.2-dirty
$
$ kubectl describe nodes
Name: raas-linux
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=raas-linux
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"a6:a1:0b:43:38:29"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.3.106
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 04 Dec 2020 09:54:49 +0900
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: raas-linux
AcquireTime: <unset>
RenewTime: Fri, 04 Dec 2020 10:00:19 +0900
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 04 Dec 2020 09:55:14 +0900 Fri, 04 Dec 2020 09:55:14 +0900 FlannelIsUp Flannel is running on this node
MemoryPressure False Fri, 04 Dec 2020 09:55:19 +0900 Fri, 04 Dec 2020 09:54:45 +0900 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 04 Dec 2020 09:55:19 +0900 Fri, 04 Dec 2020 09:54:45 +0900 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 04 Dec 2020 09:55:19 +0900 Fri, 04 Dec 2020 09:54:45 +0900 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 04 Dec 2020 09:55:19 +0900 Fri, 04 Dec 2020 09:55:19 +0900 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.3.106
Hostname: raas-linux
Capacity:
cpu: 8
ephemeral-storage: 122546800Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8066548Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 112939130694
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7964148Ki
pods: 110
System Info:
Machine ID: 5aa3b32d7e9e409091929e7cba2d558b
System UUID: a930a228-a79a-11e5-9e9a-147517224400
Boot ID: 4e6dd5d2-bcc4-433b-8c4d-df56c33a9442
Kernel Version: 5.4.0-53-generic
OS Image: Ubuntu 18.04.5 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.10
Kubelet Version: v1.19.4
Kube-Proxy Version: v1.19.4
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-f9fd979d6-h7hd5 100m (1%) 0 (0%) 70Mi (0%) 170Mi (2%) 5m9s
kube-system coredns-f9fd979d6-hbkbl 100m (1%) 0 (0%) 70Mi (0%) 170Mi (2%) 5m9s
kube-system etcd-raas-linux 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5m20s
kube-system kube-apiserver-raas-linux 250m (3%) 0 (0%) 0 (0%) 0 (0%) 5m20s
kube-system kube-controller-manager-raas-linux 200m (2%) 0 (0%) 0 (0%) 0 (0%) 5m20s
kube-system kube-flannel-ds-k8b2d 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 5m9s
kube-system kube-proxy-wgn4l 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5m9s
kube-system kube-scheduler-raas-linux 100m (1%) 0 (0%) 0 (0%) 0 (0%) 5m20s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 850m (10%) 100m (1%)
memory 190Mi (2%) 390Mi (5%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 5m20s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 5m20s kubelet Node raas-linux status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 5m20s kubelet Node raas-linux status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 5m20s kubelet Node raas-linux status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 5m20s kubelet Updated Node Allocatable limit across pods
Normal Starting 5m8s kube-proxy Starting kube-proxy.
Normal NodeReady 5m kubelet Node raas-linux status is now: NodeReady
Name: robot-dd9f6aaa
Roles: <none>
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/os=linux
kubernetes.io/arch=arm64
kubernetes.io/hostname=robot-dd9f6aaa
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 04 Dec 2020 09:55:58 +0900
Taints: node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: robot-dd9f6aaa
AcquireTime: <unset>
RenewTime: Fri, 04 Dec 2020 10:00:16 +0900
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 04 Dec 2020 09:55:58 +0900 Fri, 04 Dec 2020 09:55:58 +0900 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 04 Dec 2020 09:55:58 +0900 Fri, 04 Dec 2020 09:55:58 +0900 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 04 Dec 2020 09:55:58 +0900 Fri, 04 Dec 2020 09:55:58 +0900 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Fri, 04 Dec 2020 09:55:58 +0900 Fri, 04 Dec 2020 09:55:58 +0900 KubeletNotReady [container runtime status check may not have completed yet, runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, missing node capacity for resources: ephemeral-storage]
Addresses:
InternalIP: 192.168.3.102
Hostname: robot-dd9f6aaa
Capacity:
cpu: 8
memory: 7770620Ki
pods: 110
Allocatable:
cpu: 8
memory: 7668220Ki
pods: 110
System Info:
Machine ID: de6c58c435a543de8e13ce6a76477fa0
System UUID: de6c58c435a543de8e13ce6a76477fa0
Boot ID: d0999dd7-ab7d-4459-b0cd-9b25f5a50ae4
Kernel Version: 4.9.103-sda845-smp
OS Image: Kairos - Smart Machine Platform 1.0
Operating System: linux
Architecture: arm64
Container Runtime Version: docker://19.3.2
Kubelet Version: v1.16.2-dirty
Kube-Proxy Version: v1.16.2-dirty
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system kube-flannel-ds-9xc6n 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 4m21s
kube-system kube-proxy-4dk7f 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m21s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (1%) 100m (1%)
memory 50Mi (0%) 50Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 4m22s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 4m21s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 4m21s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 4m21s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal Starting 4m10s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 4m10s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 4m10s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 4m10s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal Starting 3m59s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 3m59s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 3m59s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 3m59s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal Starting 3m48s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 3m48s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 3m48s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal NodeHasNoDiskPressure 3m48s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal Starting 3m37s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 3m36s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 3m36s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 3m36s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal Starting 3m25s kubelet Starting kubelet.
Normal Starting 3m14s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 3m3s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal Starting 3m3s kubelet Starting kubelet.
Normal Starting 2m52s kubelet Starting kubelet.
Normal Starting 2m40s kubelet Starting kubelet.
Normal Starting 2m29s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 2m29s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 2m29s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 2m29s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal NodeHasSufficientMemory 2m18s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 2m18s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 2m18s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal Starting 2m18s kubelet Starting kubelet.
Normal Starting 2m7s kubelet Starting kubelet.
Normal Starting 115s kubelet Starting kubelet.
Normal NodeHasNoDiskPressure 104s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal Starting 104s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 104s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal Starting 93s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 93s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 93s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 93s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal Starting 82s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 82s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal Starting 71s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 70s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 70s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 70s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientPID
Normal Starting 59s kubelet Starting kubelet.
Normal Starting 48s kubelet Starting kubelet.
Normal Starting 37s kubelet Starting kubelet.
Normal Starting 26s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 25s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal Starting 15s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 14s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal Starting 3s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 3s kubelet Node robot-dd9f6aaa status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 3s kubelet Node robot-dd9f6aaa status is now: NodeHasNoDiskPressure

Delete /etc/docker/daemon.json file and reboot
Install CNI plugins binaries in /opt/cni/bin directory
https://github.com/containernetworking/plugins/releases/download/v0.8.7/cni-plugins-linux-arm64-v0.8.7.tgz

Step 1: kubectl get mutatingwebhookconfigurations -oyaml > mutating.txt
Step 2: Kubectl delete -f mutating.txt
Step 3: Restart the node
Step 4: You should be able to see the node is ready
Step 5: Install the mutatingwebhookconfiguration back

Minikube pod keep a forever pending status and failed to be scheduled

I am very begginer on kubernetes. Sorry if this a is dumb question.
I am using minikube and kvm2(5.0.0). Here is the info about minikube and kubectl version
Minikube status output
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
kubectl cluster-info output:
Kubernetes master is running at https://127.0.0.1:32768
KubeDNS is running at https://127.0.0.1:32768/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
I am trying to deploy a pod using kubectl apply -f client-pod.yaml. Here is my client-pod.yaml configuration
apiVersion: v1
kind: Pod
metadata:
name: client-pod
labels:
component: web
spec:
containers:
- name: client
image: stephengrider/multi-client
ports:
- containerPort: 3000
This is the kubectl get pods output:
NAME READY STATUS RESTARTS AGE
client-pod 0/1 Pending 0 4m15s
kubectl describe pods output:
Name: client-pod
Namespace: default
Priority: 0
Node: <none>
Labels: component=web
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"component":"web"},"name":"client-pod","namespace":"default"},"spec...
Status: Pending
IP:
IPs: <none>
Containers:
client:
Image: stephengrider/multi-client
Port: 3000/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-z45bq (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-z45bq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-z45bq
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
I have been searching for a way to see which taints is stopping to pod to initialize wihout luck.
Is there a way to see the taint that is failing?
kubectl get nodes output:
NAME STATUS ROLES AGE VERSION
m01 Ready master 11h v1.17.3
-- EDIT --
kubectl describe nodes output:
Name: home-pc
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=home-pc
kubernetes.io/os=linux
minikube.k8s.io/commit=eb13446e786c9ef70cb0a9f85a633194e62396a1
minikube.k8s.io/name=minikube
minikube.k8s.io/updated_at=2020_03_17T22_51_28_0700
minikube.k8s.io/version=v1.8.2
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 17 Mar 2020 22:51:25 -0500
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: home-pc
AcquireTime: <unset>
RenewTime: Tue, 17 Mar 2020 22:51:41 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:21 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 17 Mar 2020 22:51:41 -0500 Tue, 17 Mar 2020 22:51:41 -0500 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.0.12
Hostname: home-pc
Capacity:
cpu: 12
ephemeral-storage: 227688908Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8159952Ki
pods: 110
Allocatable:
cpu: 12
ephemeral-storage: 209838097266
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8057552Ki
pods: 110
System Info:
Machine ID: 339d426453b4492da92f75d06acc1e0d
System UUID: 62eedb55-444f-61ce-75e9-b06ebf3331a0
Boot ID: a9ae9889-d7cb-48c5-ae75-b2052292ac7a
Kernel Version: 5.0.0-38-generic
OS Image: Ubuntu 19.04
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.5
Kubelet Version: v1.17.3
Kube-Proxy Version: v1.17.3
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-6955765f44-mbwqt 100m (0%) 0 (0%) 70Mi (0%) 170Mi (2%) 10s
kube-system coredns-6955765f44-sblf2 100m (0%) 0 (0%) 70Mi (0%) 170Mi (2%) 10s
kube-system etcd-home-pc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-apiserver-home-pc 250m (2%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-controller-manager-home-pc 200m (1%) 0 (0%) 0 (0%) 0 (0%) 13s
kube-system kube-proxy-lk7xs 0 (0%) 0 (0%) 0 (0%) 0 (0%) 10s
kube-system kube-scheduler-home-pc 100m (0%) 0 (0%) 0 (0%) 0 (0%) 12s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (6%) 0 (0%)
memory 140Mi (1%) 340Mi (4%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 24s kubelet, home-pc Starting kubelet.
Normal NodeHasSufficientMemory 23s (x4 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 23s (x3 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 23s (x3 over 24s) kubelet, home-pc Node home-pc status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 23s kubelet, home-pc Updated Node Allocatable limit across pods
Normal Starting 13s kubelet, home-pc Starting kubelet.
Normal NodeHasSufficientMemory 13s kubelet, home-pc Node home-pc status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 13s kubelet, home-pc Node home-pc status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 13s kubelet, home-pc Node home-pc status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 13s kubelet, home-pc Updated Node Allocatable limit across pods
Normal Starting 9s kube-proxy, home-pc Starting kube-proxy.
Normal NodeReady 3s kubelet, home-pc Node home-pc status is now: NodeReady

You have some taints on the node which is stopping the scheduler from deploying the pod.Either remove the taint from master node or add tolerations in the pod spec.

How to Change Kubernetes Node Status from "Ready" to "NotReady" by changing CPU Utilization or memory utilization or Disk Pressure?

I have kubernetes cluster set up of 1 master and 1 worker node.For testing purpose , I increased CPU Utlization and memory utilization upto 100%, but stilling Node is not getting status "NotReady".
I am testing Pressure Status .. How to change status flag of MemoryPressure to true or DiskPressure or PIDPressure to true
Here are my master node condtions :-
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 27 Nov 2019 14:36:29 +0000 Wed, 27 Nov 2019 14:36:29 +0000 WeaveIsUp Weave pod has set this
MemoryPressure False Thu, 28 Nov 2019 07:36:46 +0000 Fri, 22 Nov 2019 13:30:38 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 28 Nov 2019 07:36:46 +0000 Fri, 22 Nov 2019 13:30:38 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 28 Nov 2019 07:36:46 +0000 Fri, 22 Nov 2019 13:30:38 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 28 Nov 2019 07:36:46 +0000 Fri, 22 Nov 2019 13:30:48 +0000 KubeletReady kubelet is posting ready status
Here are pods info :-
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-5644d7b6d9-dm8v7 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 22d
kube-system coredns-5644d7b6d9-mz5rm 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 22d
kube-system etcd-ip-172-31-28-186.us-east-2.compute.internal 0 (0%) 0 (0%) 0 (0%) 0 (0%) 22d
kube-system kube-apiserver-ip-172-31-28-186.us-east-2.compute.internal 250m (12%) 0 (0%) 0 (0%) 0 (0%) 22d
kube-system kube-controller-manager-ip-172-31-28-186.us-east-2.compute.internal 200m (10%) 0 (0%) 0 (0%) 0 (0%) 22d
kube-system kube-proxy-cw8vv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 22d
kube-system kube-scheduler-ip-172-31-28-186.us-east-2.compute.internal 100m (5%) 0 (0%) 0 (0%) 0 (0%) 22d
kube-system weave-net-ct9zb 20m (1%) 0 (0%) 0 (0%) 0 (0%) 22d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 770m (38%) 0 (0%)
memory 140Mi (1%) 340Mi (4%)
ephemeral-storage 0 (0%) 0 (0%)
Here for worker node :-
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 28 Nov 2019 07:00:08 +0000 Thu, 28 Nov 2019 07:00:08 +0000 WeaveIsUp Weave pod has set this
MemoryPressure False Thu, 28 Nov 2019 07:39:03 +0000 Thu, 28 Nov 2019 07:00:00 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 28 Nov 2019 07:39:03 +0000 Thu, 28 Nov 2019 07:00:00 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 28 Nov 2019 07:39:03 +0000 Thu, 28 Nov 2019 07:00:00 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 28 Nov 2019 07:39:03 +0000 Thu, 28 Nov 2019 07:00:00 +0000 KubeletReady kubelet is posting ready status

There are several ways of making a node to get into NotReady status, but not through Pods. When a Pod starts to consume too much memory kubelet will just kill that pod, to precisely protect the node.
I guess you want to test what happens when a node goes down, in that case you want to drain it. In other words, to simulate node issues, you should do:
kubectl drain NODE
Still, check on kubectl drain --help to see under what circumstances what happens.
EDIT
I tried, actually, accessing the node and running stress on the node directly, and this is what happened within 20 seconds:
root#gke-klusta-lemmy-3ce02acd-djhm:/# stress --cpu 16 --io 8 --vm 8 --vm-bytes 2G
Checking on the node:
$ kubectl get no -w | grep gke-klusta-lemmy-3ce02acd-djhm
gke-klusta-lemmy-3ce02acd-djhm Ready <none> 15d v1.13.11-gke.14
gke-klusta-lemmy-3ce02acd-djhm Ready <none> 15d v1.13.11-gke.14
gke-klusta-lemmy-3ce02acd-djhm NotReady <none> 15d v1.13.11-gke.14
gke-klusta-lemmy-3ce02acd-djhm NotReady <none> 15d v1.13.11-gke.14
gke-klusta-lemmy-3ce02acd-djhm NotReady <none> 15d v1.13.11-gke.14
I am running pretty weak nodes. 1CPU#4GB RAM

Just bring down kubelet service on the node that you want to be in NotReady status

kubectl master node notready: Starting kube-proxy

I just installed Kubernetes on a cluster of Ubuntu's using the explanation of digital ocean with Ansible. Everything seems to be fine; but when verifying the cluster, the master node is in a not ready status:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
jwdkube-master-01 NotReady master 44m v1.12.2
jwdkube-worker-01 Ready <none> 44m v1.12.2
jwdkube-worker-02 Ready <none> 44m v1.12.2
This is the version:
# kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
When I check the master node, the kube-proxy is hanging in a starting mode:
# kubectl describe nodes jwdkube-master-01
Name: jwdkube-master-01
Roles: master
...
LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Thu, 08 Nov 2018 10:24:45 +0000 Thu, 08 Nov 2018 09:36:10 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Thu, 08 Nov 2018 10:24:45 +0000 Thu, 08 Nov 2018 09:36:10 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 08 Nov 2018 10:24:45 +0000 Thu, 08 Nov 2018 09:36:10 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 08 Nov 2018 10:24:45 +0000 Thu, 08 Nov 2018 09:36:10 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Thu, 08 Nov 2018 10:24:45 +0000 Thu, 08 Nov 2018 09:36:10 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
InternalIP: 104.248.207.107
Hostname: jwdkube-master-01
Capacity:
cpu: 1
ephemeral-storage: 25226960Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1008972Ki
pods: 110
Allocatable:
cpu: 1
ephemeral-storage: 23249166298
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 906572Ki
pods: 110
System Info:
Machine ID: 771c0f669c0a40a1ba7c28bf1f05a637
System UUID: 771c0f66-9c0a-40a1-ba7c-28bf1f05a637
Boot ID: 2532ae4d-c08c-45d8-b94c-6e88912ed627
Kernel Version: 4.18.0-10-generic
OS Image: Ubuntu 18.10
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.2
Kube-Proxy Version: v1.12.2
PodCIDR: 10.244.0.0/24
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system etcd-jwdkube-master-01 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-jwdkube-master-01 250m (25%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-jwdkube-master-01 200m (20%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-p8cbq 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-jwdkube-master-01 100m (10%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 550m (55%) 0 (0%)
memory 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasSufficientDisk 48m (x6 over 48m) kubelet, jwdkube-master-01 Node jwdkube-master-01 status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 48m (x6 over 48m) kubelet, jwdkube-master-01 Node jwdkube-master-01 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 48m (x6 over 48m) kubelet, jwdkube-master-01 Node jwdkube-master-01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 48m (x5 over 48m) kubelet, jwdkube-master-01 Node jwdkube-master-01 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 48m kubelet, jwdkube-master-01 Updated Node Allocatable limit across pods
Normal Starting 48m kube-proxy, jwdkube-master-01 Starting kube-proxy.
update
running kubectl get pods -n kube-system:
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-8p7k2 1/1 Running 0 4h47m
coredns-576cbf47c7-s5tlv 1/1 Running 0 4h47m
etcd-jwdkube-master-01 1/1 Running 1 140m
kube-apiserver-jwdkube-master-01 1/1 Running 1 140m
kube-controller-manager-jwdkube-master-01 1/1 Running 1 140m
kube-flannel-ds-5bzrx 1/1 Running 0 4h47m
kube-flannel-ds-bfs9k 1/1 Running 0 4h47m
kube-proxy-4lrzw 1/1 Running 1 4h47m
kube-proxy-57x28 1/1 Running 0 4h47m
kube-proxy-j8bf5 1/1 Running 0 4h47m
kube-scheduler-jwdkube-master-01 1/1 Running 1 140m
tiller-deploy-6f6fd74b68-5xt54 1/1 Running 0 112m

It seems to be a problem of Flannel v0.9.1 compatibility with Kubernetes cluster v1.12.2. Once you replace URL in master configuration playbook it should help:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/
kube-flannel.yml
To enforce this solution on the current cluster:
On the master node delete relevant objects for Flannel v0.9.1:
kubectl delete clusterrole flannel -n kube-system
kubectl delete clusterrolebinding flannel-n kube-system
kubectl delete clusterrolebinding flannel -n kube-system
kubectl delete serviceaccount flannel -n kube-system
kubectl delete configmap kube-flannel-cfg -n kube-system
kubectl delete daemonset.extensions kube-flannel-ds -n kube-system
Proceed also with Flannel Pods deletion:
kubectl delete pod kube-flannel-ds-5bzrx -n kube-system
kubectl delete pod kube-flannel-ds-bfs9k -n kube-system
And check whether no more objects related to Flannel exist:
kubectl get all --all-namespaces
Install the latest Flannel version to your cluster:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
For me it works, however if you discover any further problems write a comment below this answer.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kubernetes kubectl shows pods restarts as zero but pods age has changed - kubernetes

Related

Using GPU with Kubernetes GKE and Node auto-provisioning

There is no ephemeral-storage resource on worker node of kubernetes

Minikube pod keep a forever pending status and failed to be scheduled

How to Change Kubernetes Node Status from "Ready" to "NotReady" by changing CPU Utilization or memory utilization or Disk Pressure?

kubectl master node notready: Starting kube-proxy

Categories

Resources