kubernetes local cluster create pods got errors like ‘ErrImagePull’ and ‘ImagePullBackOff’ - kubernetes

I just installed a kubernetes local cluster, but when I tried the command
cluster/kubectl.sh run my-nginx --image=nginx --replicas=2 --port=80
to create and run pods, here is what I got:
NAME READY STATUS RESTARTS AGE
my-nginx-00t7f 0/1 ContainerCreating 0 23m
my-nginx-spy2b 0/1 ContainerCreating 0 23m
and I used kubectl logs, I got
Pod "my-nginx-00t7f" in namespace "default" : pod is not in 'Running', 'Succeeded' or 'Failed' state - State: "Pending"
Seems it got stuck in 'pending' status.
Then I used 'kubectl describe' and got
Name: my-nginx-00t7f
Namespace: default
Image(s): nginx
Node: 127.0.0.1/127.0.0.1
Start Time: Thu, 17 Dec 2015 22:27:18 +0800
Labels: run=my-nginx
Status: Pending
Reason:
Message:
IP:
Replication Controllers: my-nginx (2/2 replicas created)
Containers:
my-nginx:
Container ID:
Image: nginx
Image ID:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
default-token-p09p6:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-p09p6
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
26m 26m 1 {scheduler } Normal Scheduled Successfully assigned my-nginx-00t7f to 127.0.0.1
22m 1m 79 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: ImagePullBackOff
24m 5s 8 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: ErrImagePull
It seems my docker can not pull images, but actually it can, there is no problem when I docker pull nginx.

I assume that you figured out that it was the pause container that couldn't be pulled from the Kubelet logs.
Kubernetes needs to create a container for the pod in order to hold shared resources, such as the network namespace. It uses the pause container for this, which is a very small container that just sleeps forever.

If your container remains in pending status then please check the kube-schedular services. If its stopped state, turn it on and check.

Related

kubernetes pod (mssql-tools) failing with CrashLoopBackOff error and restarting

I'm using Rancher Dekstop for K8 in WSL 2 in Windows 11.
I'm trying to create a pod using the simple yaml:
apiVersion: v1
kind: Pod
metadata:
name: mssql-tools
labels:
name: mssql-tools
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
But it is continuously giving CrashLoopBackOff error.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mssql-tools 0/1 CrashLoopBackOff 11 (8s ago) 14m
And here is the result of kubectl describe pod mssql-tool:
$ kubectl describe pod mssql-tools
Name: mssql-tools
Namespace: default
Priority: 0
Service Account: default
Node: desktop-2ohsprk/172.22.97.204
Start Time: Mon, 26 Dec 2022 04:34:19 +0500
Labels: name=mssql-tools
Annotations: <none>
Status: Running
IP: 10.42.0.57
IPs:
IP: 10.42.0.57
Containers:
mssql-tools:
Container ID: docker://76343010f4344a5d26fb35f3b0278271d3336e8e10d695cc22e78520262f34bf
Image: mcr.microsoft.com/mssql-tools:latest
Image ID: docker-pullable://mcr.microsoft.com/mssql-tools#sha256:62556500522072535cb3df2bb5965333dded9be47000473e9e0f84118e248642
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 26 Dec 2022 04:46:20 +0500
Finished: Mon, 26 Dec 2022 04:46:20 +0500
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 26 Dec 2022 04:45:51 +0500
Finished: Mon, 26 Dec 2022 04:45:51 +0500
Ready: False
Restart Count: 9
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wkqlg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-wkqlg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned default/mssql-tools to desktop-2ohsprk
Normal Pulled 12m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 1.459473213s
Normal Pulled 12m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 823.403008ms
Normal Pulled 11m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 835.697509ms
Normal Pulled 11m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 873.802598ms
Normal Created 11m (x4 over 12m) kubelet Created container mssql-tools
Normal Started 11m (x4 over 12m) kubelet Started container mssql-tools
Normal Pulling 10m (x5 over 12m) kubelet Pulling image "mcr.microsoft.com/mssql-tools:latest"
Normal Pulled 10m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 740.64559ms
Warning BackOff 6m56s (x25 over 11m) kubelet Back-off restarting failed container
Normal SandboxChanged 50s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 48s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 951.332457ms
Normal Pulled 32s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 828.839917ms
Normal Pulling 4s (x3 over 49s) kubelet Pulling image "mcr.microsoft.com/mssql-tools:latest"
Normal Pulled 3s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 713.951656ms
Normal Created 3s (x3 over 48s) kubelet Created container mssql-tools
Normal Started 3s (x3 over 48s) kubelet Started container mssql-tools
Warning BackOff 2s (x5 over 47s) kubelet Back-off restarting failed container
The same container works perfectly if I run it via docker and I can use its shell to execute sqlcmd properly.
I can't figure out any reason for this.
Any help would be really appreciated.
Thanks
Crashloopbackoff is the common error which indicates that pod failed to start and it continued to fail repeatedly when kubernetes tried to restart this.
To troubleshoot this issue follow the below steps:
Check for “Back off Restarting Failed Container” by running the command Run kubectl describe pod [name].
If you get a Liveness probe failed and Back-off restarting failed container messages from the kubelet, this indicates the container is not responding and is in the process of restarting.
Check from the previous container instance. Run kubectl get pods to identify the Kubernetes pod that causes CrashLoopBackOff error. You can run kubectl logs --previous --tail 10command to get the last ten log lines from the pod.
Check deployment logs by running the command: kubectl logs -f deploy/ -n
Refer to this link for more detailed troubleshooting steps.
So after trying and digging through multiple options, finally it worked by executing the command sleep 3600000 i.e. delaying it so that the pod initializes itself properly and then executes the container.
Here is the working yaml:
apiVersion: v1
kind: Pod
metadata:
name: mssql-tools
labels:
name: mssql-tools
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
command: ["sleep"]
args:
- "3600000"
imagePullPolicy: IfNotPresent
The command and argument passing portion can also be mentioned like the following:
apiVersion: v1
...
...
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
command:
- sleep
- "3600000"
...
and btw, you can also deploy a container by passing a command with the kubectl run command line: i.e.
kubectl run mssql --image=mcr.microsoft.com/mssql-tools --command sleep 3600000 -n myNameSpace
Note: You can omit -n myNameSpace if you are not deploying it in a specific namespace or deploying it in the default namespace.

CockroachDB Cluster on Kubernetes Pods Crashing

I'm trying to install a CockroachDB Helm chart on a 2 node Kubernetes cluster using this command:
helm install my-release --set statefulset.replicas=2 stable/cockroachdb
I have already created 2 persistent volumes:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv00001 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-0 11m
pv00002 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-1 11m
I'm getting a weird error and I'm new to Kubernetes so I'm not sure what I'm doing wrong. I've tried creating a StorageClass and using it with my PVs but then the CockroachDB PVCs won't bind to them. I suspect there may be something wrong with my PV setup?
I've tried using kubectl logs but the only error I'm seeing is this:
standard_init_linux.go:211: exec user process caused "exec format
error"
and the pods are crashing over and over:
NAME READY STATUS RESTARTS AGE
my-release-cockroachdb-0 0/1 Pending 0 11m
my-release-cockroachdb-1 0/1 CrashLoopBackOff 7 11m
my-release-cockroachdb-init-tfcks 0/1 CrashLoopBackOff 5 5m29s
Any idea why the pods are crashing?
Here's kubectl describe for the init pod:
Name: my-release-cockroachdb-init-tfcks
Namespace: default
Priority: 0
Node: axon/192.168.1.7
Start Time: Sat, 04 Apr 2020 00:22:19 +0100
Labels: app.kubernetes.io/component=init
app.kubernetes.io/instance=my-release
app.kubernetes.io/name=cockroachdb
controller-uid=54c7c15d-eb1c-4392-930a-d9b8e9225a45
job-name=my-release-cockroachdb-init
Annotations: <none>
Status: Running
IP: 10.44.0.1
IPs:
IP: 10.44.0.1
Controlled By: Job/my-release-cockroachdb-init
Containers:
cluster-init:
Container ID: docker://82a062c6862a9fd5047236feafe6e2654ec1f6e3064fd0513341a1e7f36eaed3
Image: cockroachdb/cockroach:v19.2.4
Image ID: docker-pullable://cockroachdb/cockroach#sha256:511b6d09d5bc42c7566477811a4e774d85d5689f8ba7a87a114b96d115b6149b
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
while true; do initOUT=$(set -x; /cockroach/cockroach init --insecure --host=my-release-cockroachdb-0.my-release-cockroachdb:26257 2>&1); initRC="$?"; echo $initOUT; [[ "$initRC" == "0" ]] && exit 0; [[ "$initOUT" == *"cluster has already been initialized"* ]] && exit 0; sleep 5; done
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 04 Apr 2020 00:28:04 +0100
Finished: Sat, 04 Apr 2020 00:28:04 +0100
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cz2sn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-cz2sn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cz2sn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/my-release-cockroachdb-init-tfcks to axon
Normal Pulled 5m9s (x5 over 6m45s) kubelet, axon Container image "cockroachdb/cockroach:v19.2.4" already present on machine
Normal Created 5m8s (x5 over 6m45s) kubelet, axon Created container cluster-init
Normal Started 5m8s (x5 over 6m44s) kubelet, axon Started container cluster-init
Warning BackOff 92s (x26 over 6m42s) kubelet, axon Back-off restarting failed container
When Pods get crashed, the most important thing to troubleshoot is their descriptions(kubectl describe) and logs.
Logs of the failed Pod show that the arch of the cockroach image doesn't match to the nodes.
Run kubectl get po -o wide to get nodes where cockroach runs and check their arch.
A 2-node CockroachDB cluster is an anti-pattern. You need 3 or more nodes to avoid data or cluster-wide unavailability when a single node fails. Consider checking out these videos explaining how data in CockroachDB is organized and then how the nodes in a cluster work together to keep data available in the face of node failure.
Only if you have 3 nodes (or more), you will not risk losing data if any of the notes gets corrupted. Apart from it, its easier to explain how to do it right, than finding out what went wrong, and to find out what went wrong, one must go through the logs.
If you attach the log, I can take a look.
I also wrote a detailed guide that may address the "doing it right" part of my answer. I elaborated even more about the entire process here.

My pod is in Container Creating state, showing TLS handshake timeout

I use docker pull command can pull mirror image correctly,But when I use the kubectl run command,my pod is in ContainerCreating state.How can I fix it.
[root#centos-master etc]# kubectl run my-nginx --image=nginx
deployment "my-nginx" created
[root#centos-master etc]# kubectl get pods
NAME READY STATUS RESTARTS AGE
my-nginx-2723453542-5s33f 0/1 ContainerCreating 0 7s
[root#centos-master etc]# kubectl describe pod my-nginx-2723453542-5s33f
Name: my-nginx-2723453542-5s33f
Namespace: default
Node: centos-minion-2/104.21.51.35
Start Time: Fri, 30 Aug 2019 16:11:57 +0800
Labels: pod-template-hash=2723453542
run=my-nginx
Status: Pending
IP:
Controllers: ReplicaSet/my-nginx-2723453542
Containers:
my-nginx:
Container ID:
Image: nginx
Image ID:
Port:
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-2723453542-5s33f to centos-minion-2
<invalid> <invalid> 5 {kubelet centos-minion-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Get https://registry.access.redhat.com/v1/_ping: proxyconnect tcp: net/http: TLS handshake timeout)"
<invalid> <invalid> 11 {kubelet centos-minion-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"registry.access.redhat.com/rhel7/pod-infrastructure:latest\""
As was recommended by #char and #prometherion, in order to sort out this issue you probably need to supply KUBELET_ARGS parameters with appropriate --pod-infra-container-image flag as per link provided :
KUBELET_POD_INFRA_CONTAINER="--pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest"
You can also take into the consideration solution mentioned by #Matthew installing subscription-manager package and subscribe host OS, as described here.

Kubernetes pod desc shows "connection refused" error

I am a Kubernetes newbie. I am running out ideas in solving the Pod status being stuck at ContainerCreating. I am working on a sample application from AWS (https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html#eks-guestbook), the sample is very similar to the official sample (https://kubernetes.io/docs/tutorials/stateless-application/guestbook/).
Many thanks for anyone giving guidance in finding the root causes:
Why do I get conn refused error, what does port 50051 do? Thanks.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default guestbook-8k9pp 0/1 ContainerCreating 0 15h
default guestbook-b2n49 0/1 ContainerCreating 0 15h
default guestbook-gtjnj 0/1 ContainerCreating 0 15h
default redis-master-rhwnt 0/1 ContainerCreating 0 15h
default redis-slave-b284x 0/1 ContainerCreating 0 15h
default redis-slave-vnlj4 0/1 ContainerCreating 0 15h
kube-system aws-node-jkfg8 0/1 CrashLoopBackOff 273 1d
kube-system aws-node-lpvn9 0/1 CrashLoopBackOff 273 1d
kube-system aws-node-nmwzn 0/1 Error 274 1d
kube-system kube-dns-64b69465b4-ftlm6 0/3 ContainerCreating 0 4d
kube-system kube-proxy-cxdj7 1/1 Running 0 1d
kube-system kube-proxy-g2js4 1/1 Running 0 1d
kube-system kube-proxy-rhq6v 1/1 Running 0 1d
$ kubectl describe pod guestbook-8k9pp
Name: guestbook-8k9pp
Namespace: default
Node: ip-172-31-91-242.ec2.internal/172.31.91.242
Start Time: Wed, 31 Oct 2018 04:59:11 -0800
Labels: app=guestbook
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicationController/guestbook
Containers:
guestbook:
Container ID:
Image: k8s.gcr.io/guestbook:v3
Image ID:
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jb75l (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-jb75l:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-jb75l
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 11m (x19561 over 13h) kubelet, ip-172-31-91-242.ec2.internal Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 74s (x19368 over 13h) kubelet, ip-172-31-91-242.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "guestbook-8k9pp_default" network: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: **desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"**
The Kubernetes cluster that I created is on AWS EKS. The EKS cluster were created manually by me through the EKS console.
I have created a second cluster with official VPC sample for EKS cluster (https://amazon-eks.s3-us-west-2.amazonaws.com/cloudformation/2018-08-30/amazon-eks-vpc-sample.yaml), and it seems to be working now.
So the problem should be the VPC configurations. Once I figured out what actually went wrong, will post info here, thank you.
I had a similar problem. Same error message, but much simpler set of Pods.
Using kubectl get pods --all-namespaces it revealed that one particular node had CrashLoopBackOff.
I scaled-in my nodes, and then scaled-out again (effectively re-creating that node), and that problem seems to have gone away.

Kafka Pod doesn't start on GKE

I followed this tutorial and when I tried to run it on GKE I was not able to start kafka pod.
It returns CrashLoopBackOff all the time. And I don't know how to show pod error logs.
Here is the result when I hit kubectl describe pod my-pod-xxx:
Name: kafka-broker1-54cb95fb44-hlj5b
Namespace: default
Node: gke-xxx-default-pool-f9e313ed-zgcx/10.146.0.4
Start Time: Thu, 25 Oct 2018 11:40:21 +0900
Labels: app=kafka
id=1
pod-template-hash=1076519600
Annotations: kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container kafka
Status: Running
IP: 10.48.8.10
Controlled By: ReplicaSet/kafka-broker1-54cb95fb44
Containers:
kafka:
Container ID: docker://88ee6a1df4157732fc32b7bd8a81e329dbdxxxx9cbe614689e775d183dbcd61
Image: wurstmeister/kafka
Image ID: docker-pullable://wurstmeister/kafka#sha256:4f600a95fa1288f7b1xxxxxa32ca00b4fb13b83b31533fa6b40499bd9bdf192f
Port: 9092/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 25 Oct 2018 14:35:32 +0900
Finished: Thu, 25 Oct 2018 14:35:51 +0900
Ready: False
Restart Count: 37
Requests:
cpu: 100m
Environment:
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ADVERTISED_HOST_NAME: 35.194.100.32
KAFKA_ZOOKEEPER_CONNECT: zoo1:2181
KAFKA_BROKER_ID: 1
KAFKA_CREATE_TOPICS: topic1:3:3
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-w6s7n (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-w6s7n:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-w6s7n
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 5m (x716 over 2h) kubelet, gke-xxx-default-pool-f9e313ed-zgcx Back-off restarting failed container
Normal Pulling 36s (x38 over 2h) kubelet, gke-xxxdefault-pool-f9e313ed-zgcx pulling image "wurstmeister/kafka"
I noticed that on the first run it is going well but after that,Node is changing status to NotReady and kafka pod is entering the CrashLoopBackOff
state.
Here is the log before it goes down:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m default-scheduler Successfully assigned kafka-broker1-54cb95fb44-wwf2h to gke-xxx-default-pool-f9e313ed-8mr6
Normal SuccessfulMountVolume 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 MountVolume.SetUp succeeded for volume "default-token-w6s7n"
Normal Pulling 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 pulling image "wurstmeister/kafka"
Normal Pulled 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 Successfully pulled image "wurstmeister/kafka"
Normal Created 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 Created container
Normal Started 5m kubelet, gke-xxx-default-pool-f9e313ed-8mr6 Started container
Normal NodeControllerEviction 38s node-controller Marking for deletion Pod kafka-broker1-54cb95fb44-wwf2h from Node gke-dev-centurion-default-pool-f9e313ed-8mr6
Could anyone tell me what's wrong with my pod and how can I catch the error for pod failure?
I just figured out that my cluster's nodes have not enough resources.
After creating a new cluster with more memory, it works.