CockroachDB Cluster on Kubernetes Pods Crashing

CockroachDB Cluster on Kubernetes Pods Crashing - kubernetes

I'm trying to install a CockroachDB Helm chart on a 2 node Kubernetes cluster using this command:
helm install my-release --set statefulset.replicas=2 stable/cockroachdb
I have already created 2 persistent volumes:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv00001 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-0 11m
pv00002 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-1 11m
I'm getting a weird error and I'm new to Kubernetes so I'm not sure what I'm doing wrong. I've tried creating a StorageClass and using it with my PVs but then the CockroachDB PVCs won't bind to them. I suspect there may be something wrong with my PV setup?
I've tried using kubectl logs but the only error I'm seeing is this:
standard_init_linux.go:211: exec user process caused "exec format
error"
and the pods are crashing over and over:
NAME READY STATUS RESTARTS AGE
my-release-cockroachdb-0 0/1 Pending 0 11m
my-release-cockroachdb-1 0/1 CrashLoopBackOff 7 11m
my-release-cockroachdb-init-tfcks 0/1 CrashLoopBackOff 5 5m29s
Any idea why the pods are crashing?
Here's kubectl describe for the init pod:
Name: my-release-cockroachdb-init-tfcks
Namespace: default
Priority: 0
Node: axon/192.168.1.7
Start Time: Sat, 04 Apr 2020 00:22:19 +0100
Labels: app.kubernetes.io/component=init
app.kubernetes.io/instance=my-release
app.kubernetes.io/name=cockroachdb
controller-uid=54c7c15d-eb1c-4392-930a-d9b8e9225a45
job-name=my-release-cockroachdb-init
Annotations: <none>
Status: Running
IP: 10.44.0.1
IPs:
IP: 10.44.0.1
Controlled By: Job/my-release-cockroachdb-init
Containers:
cluster-init:
Container ID: docker://82a062c6862a9fd5047236feafe6e2654ec1f6e3064fd0513341a1e7f36eaed3
Image: cockroachdb/cockroach:v19.2.4
Image ID: docker-pullable://cockroachdb/cockroach#sha256:511b6d09d5bc42c7566477811a4e774d85d5689f8ba7a87a114b96d115b6149b
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
while true; do initOUT=$(set -x; /cockroach/cockroach init --insecure --host=my-release-cockroachdb-0.my-release-cockroachdb:26257 2>&1); initRC="$?"; echo $initOUT; [[ "$initRC" == "0" ]] && exit 0; [[ "$initOUT" == *"cluster has already been initialized"* ]] && exit 0; sleep 5; done
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 04 Apr 2020 00:28:04 +0100
Finished: Sat, 04 Apr 2020 00:28:04 +0100
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cz2sn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-cz2sn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cz2sn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/my-release-cockroachdb-init-tfcks to axon
Normal Pulled 5m9s (x5 over 6m45s) kubelet, axon Container image "cockroachdb/cockroach:v19.2.4" already present on machine
Normal Created 5m8s (x5 over 6m45s) kubelet, axon Created container cluster-init
Normal Started 5m8s (x5 over 6m44s) kubelet, axon Started container cluster-init
Warning BackOff 92s (x26 over 6m42s) kubelet, axon Back-off restarting failed container

When Pods get crashed, the most important thing to troubleshoot is their descriptions(kubectl describe) and logs.
Logs of the failed Pod show that the arch of the cockroach image doesn't match to the nodes.
Run kubectl get po -o wide to get nodes where cockroach runs and check their arch.

A 2-node CockroachDB cluster is an anti-pattern. You need 3 or more nodes to avoid data or cluster-wide unavailability when a single node fails. Consider checking out these videos explaining how data in CockroachDB is organized and then how the nodes in a cluster work together to keep data available in the face of node failure.

Only if you have 3 nodes (or more), you will not risk losing data if any of the notes gets corrupted. Apart from it, its easier to explain how to do it right, than finding out what went wrong, and to find out what went wrong, one must go through the logs.
If you attach the log, I can take a look.
I also wrote a detailed guide that may address the "doing it right" part of my answer. I elaborated even more about the entire process here.

Related

kubernetes pod (mssql-tools) failing with CrashLoopBackOff error and restarting

I'm using Rancher Dekstop for K8 in WSL 2 in Windows 11.
I'm trying to create a pod using the simple yaml:
apiVersion: v1
kind: Pod
metadata:
name: mssql-tools
labels:
name: mssql-tools
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
But it is continuously giving CrashLoopBackOff error.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mssql-tools 0/1 CrashLoopBackOff 11 (8s ago) 14m
And here is the result of kubectl describe pod mssql-tool:
$ kubectl describe pod mssql-tools
Name: mssql-tools
Namespace: default
Priority: 0
Service Account: default
Node: desktop-2ohsprk/172.22.97.204
Start Time: Mon, 26 Dec 2022 04:34:19 +0500
Labels: name=mssql-tools
Annotations: <none>
Status: Running
IP: 10.42.0.57
IPs:
IP: 10.42.0.57
Containers:
mssql-tools:
Container ID: docker://76343010f4344a5d26fb35f3b0278271d3336e8e10d695cc22e78520262f34bf
Image: mcr.microsoft.com/mssql-tools:latest
Image ID: docker-pullable://mcr.microsoft.com/mssql-tools#sha256:62556500522072535cb3df2bb5965333dded9be47000473e9e0f84118e248642
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 26 Dec 2022 04:46:20 +0500
Finished: Mon, 26 Dec 2022 04:46:20 +0500
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 26 Dec 2022 04:45:51 +0500
Finished: Mon, 26 Dec 2022 04:45:51 +0500
Ready: False
Restart Count: 9
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wkqlg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-wkqlg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned default/mssql-tools to desktop-2ohsprk
Normal Pulled 12m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 1.459473213s
Normal Pulled 12m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 823.403008ms
Normal Pulled 11m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 835.697509ms
Normal Pulled 11m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 873.802598ms
Normal Created 11m (x4 over 12m) kubelet Created container mssql-tools
Normal Started 11m (x4 over 12m) kubelet Started container mssql-tools
Normal Pulling 10m (x5 over 12m) kubelet Pulling image "mcr.microsoft.com/mssql-tools:latest"
Normal Pulled 10m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 740.64559ms
Warning BackOff 6m56s (x25 over 11m) kubelet Back-off restarting failed container
Normal SandboxChanged 50s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 48s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 951.332457ms
Normal Pulled 32s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 828.839917ms
Normal Pulling 4s (x3 over 49s) kubelet Pulling image "mcr.microsoft.com/mssql-tools:latest"
Normal Pulled 3s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 713.951656ms
Normal Created 3s (x3 over 48s) kubelet Created container mssql-tools
Normal Started 3s (x3 over 48s) kubelet Started container mssql-tools
Warning BackOff 2s (x5 over 47s) kubelet Back-off restarting failed container
The same container works perfectly if I run it via docker and I can use its shell to execute sqlcmd properly.
I can't figure out any reason for this.
Any help would be really appreciated.
Thanks

Crashloopbackoff is the common error which indicates that pod failed to start and it continued to fail repeatedly when kubernetes tried to restart this.
To troubleshoot this issue follow the below steps:
Check for “Back off Restarting Failed Container” by running the command Run kubectl describe pod [name].
If you get a Liveness probe failed and Back-off restarting failed container messages from the kubelet, this indicates the container is not responding and is in the process of restarting.
Check from the previous container instance. Run kubectl get pods to identify the Kubernetes pod that causes CrashLoopBackOff error. You can run kubectl logs --previous --tail 10command to get the last ten log lines from the pod.
Check deployment logs by running the command: kubectl logs -f deploy/ -n
Refer to this link for more detailed troubleshooting steps.

So after trying and digging through multiple options, finally it worked by executing the command sleep 3600000 i.e. delaying it so that the pod initializes itself properly and then executes the container.
Here is the working yaml:
apiVersion: v1
kind: Pod
metadata:
name: mssql-tools
labels:
name: mssql-tools
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
command: ["sleep"]
args:
- "3600000"
imagePullPolicy: IfNotPresent
The command and argument passing portion can also be mentioned like the following:
apiVersion: v1
...
...
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
command:
- sleep
- "3600000"
...
and btw, you can also deploy a container by passing a command with the kubectl run command line: i.e.
kubectl run mssql --image=mcr.microsoft.com/mssql-tools --command sleep 3600000 -n myNameSpace
Note: You can omit -n myNameSpace if you are not deploying it in a specific namespace or deploying it in the default namespace.

Pod with Debian image is getting created but container is continuously crashing

Below is my Pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: pod-debian-container
spec:
containers:
- name: pi
image: debian
command: ["/bin/echo"]
args: ["Hello, World."]
And below is the output of "describe" command for this Pod:
C:\Users\so.user\Desktop>kubectl describe pod/pod-debian-container
Name: pod-debian-container
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Mon, 15 Feb 2021 21:47:43 +0530
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.244.0.21
IPs:
IP: 10.244.0.21
Containers:
pi:
Container ID: cri-o://f9081af183308f01bf1de6108b2c988e6bcd11ab2daedf983e99e1f4d862981c
Image: debian
Image ID: docker.io/library/debian#sha256:102ab2db1ad671545c0ace25463c4e3c45f9b15e319d3a00a1b2b085293c27fb
Port: <none>
Host Port: <none>
Command:
/bin/echo
Args:
Hello, World.
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 15 Feb 2021 21:56:49 +0530
Finished: Mon, 15 Feb 2021 21:56:49 +0530
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-sxlc9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-sxlc9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-sxlc9
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned default/pod-debian-container to minikube
Normal Pulled 15m kubelet Successfully pulled image "debian" in 11.1633901s
Normal Pulled 15m kubelet Successfully pulled image "debian" in 11.4271866s
Normal Pulled 14m kubelet Successfully pulled image "debian" in 11.0252907s
Normal Pulled 14m kubelet Successfully pulled image "debian" in 11.1897469s
Normal Started 14m (x4 over 15m) kubelet Started container pi
Normal Pulling 13m (x5 over 15m) kubelet Pulling image "debian"
Normal Created 13m (x5 over 15m) kubelet Created container pi
Normal Pulled 13m kubelet Successfully pulled image "debian" in 9.1170801s
Warning BackOff 5m25s (x31 over 15m) kubelet Back-off restarting failed container
Warning Failed 10s kubelet Error: ErrImagePull
And below is another output:
C:\Users\so.user\Desktop>kubectl get pod,job,deploy,rs
NAME READY STATUS RESTARTS AGE
pod/pod-debian-container 0/1 CrashLoopBackOff 6 15m
Below are my question:
I can see that Pod is running but Container inside it is crashing. I can't understand "why" because I see that Debian image is successfully pulled
As you can see in "kubectl get pod,job,deploy,rs" output, RESTARTS is equal to 6, is it the Pod which has restarted 6 times or is it the container?
Why 6 restart happened, I didn't mention anything in my spec

This looks like a liveness problem related to the CrashLoopBackOff have you cosidered taking a look into this blog it explains very well how to debug the problem blog

How to keep my pod running (not get terminate/restart) in EKS

I am trying to use AWS EKS (fargate) to run automation case, but some pods (9 out of 10 times running) get terminated, makes automation failure.
I have a bunch of automation cases written in robotframework, the case itself is running well, but it is time-consumming, usually need 6 hours for a round. So, I think I can use K8S to run the cases in-parallel, therefore save time, I use Jenkins to config how many 'automations' run in-parallel, and after all done, merge and present the test result.
But some pods aften get terminated,
command "kubectl get pod", return something like this (I set the "restartPolicy: Never" to keep the error pod to 'describe' its info, otherwise, the pod just gone)
box6 0/1 Error 0 9m39s
command "kubectl describe pod box6" get output like following (masked some private information).
Name: box6
Namespace: default
Priority: 2000001000
Priority Class Name: system-node-critical
Node: XXXXXXXX
Start Time: Mon, 21 Dec 2020 15:29:37 +0800
Labels: eks.amazonaws.com/fargate-profile=eksautomation-profile
name=box6
purpose=demonstrate-command
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"name":"box6","purpose":"demonstrate-command"},"name":"box6","names...
kubernetes.io/psp: eks.privileged
Status: Failed
IP: 192.168.183.226
IPs:
IP: 192.168.183.226
Containers:
box6:
Container ID: XXXXXXXX
Image: XXXXXXXX
Image ID: XXXXXXXX
Port: <none>
Host Port: <none>
Command:
/bin/initMock.sh
State: Terminated
Reason: Error
Exit Code: 143
Started: Mon, 21 Dec 2020 15:32:12 +0800
Finished: Mon, 21 Dec 2020 15:34:09 +0800
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tsk7j (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-tsk7j:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tsk7j
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled <unknown> fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled <unknown> fargate-scheduler Successfully assigned default/box6 to XXXXXXXX
Normal Pulling 5m7s kubelet, XXXXXXXX Pulling image "XXXXXXXX"
Normal Pulled 2m38s kubelet, XXXXXXXX Successfully pulled image "XXXXXXXX"
Normal Created 2m38s kubelet, XXXXXXXX Created container box6
Normal Started 2m37s kubelet, XXXXXXXX Started container box6
I did some search upon the error, error code 143 is 128+SIGTERM, I am doubting that the pod get killed by EKS intentionally.
I cannot config the pod to have it restart, because if so, the automation case cannot resume, therefore makes the effort useless (not able to save automation running time).
I have tried to enable cloud watch, hoping to get a clue upon why pod get terminated, but no clue.
Why my pod get terminated by EKS? how should I troubleshooting upon it? how should I avoid it?
Thanks for your help.

Error imagepullbackoff when scaling working deployment Kubernetes

I have a working deployment of a docker image on Kubernetes. However, when I want to scale it I recieve and error that it cannot find my image (even though I'm scaling something that's already working from an image?)
Here is the command I'm using to scale the deployment.
./kubectl scale deployments/mautic --replicas=2
Here is the logs when I run Kubectl describe
Name: mautic-3389378641-jgm9b
Namespace: default
Node: minikube/192.168.99.101
Start Time: Fri, 01 Sep 2017 14:34:08 +0100
Labels: app=mautic
pod-template-hash=3389378641
tier=frontend
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"mau
tic-3389378641","uid":"52a87ff6-8f06-11e7-8fbc-080027cd66fa",...
Status: Pending
IP: 172.17.0.8
Created By: ReplicaSet/mautic-3389378641
Controlled By: ReplicaSet/mautic-3389378641
Containers:
mautic:
Container ID:
Image: mautic/mautic:latest
Image ID:
Port: 80/TCP
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment:
MAUTIC_DB_HOST: mautic-mysql
MAUTIC_DB_PASSWORD: <set to the key 'password.txt' in secret 'mysql-pass'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9drh0 (ro)
/var/www/html from mautic-local-storage (rw)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
mautic-local-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mautic-lv-claim
ReadOnly: false
default-token-9drh0:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9drh0
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
47m 47m 1 default-scheduler Normal Scheduled Successfully assigned mauti
c-3389378641-jgm9b to minikube
47m 47m 1 kubelet, minikube Normal SuccessfulMountVolume MountVolume.SetUp succeeded
for volume "pvc-5b7c14a3-8f03-11e7-8fbc-080027cd66fa"
47m 47m 1 kubelet, minikube Normal SuccessfulMountVolume MountVolume.SetUp succeeded
for volume "default-token-9drh0"
47m 7m 12 kubelet, minikube spec.containers{mautic} Normal Pulling pulling image "mautic/mauti
c:latest"
47m 6m 12 kubelet, minikube spec.containers{mautic} Warning Failed Failed to pull image "mauti
c/mautic:latest": rpc error: code = 2 desc = Network timed out while trying to connect to https://index.docker.io/v1/repositories/mautic/mautic/images. You
may want to check your internet connection or if you are behind a proxy.
47m 6m 170 kubelet, minikube Warning FailedSync Error syncing pod
47m 6m 158 kubelet, minikube spec.containers{mautic} Normal BackOff Back-off pulling image "mau
tic/mautic:latest"
But the Mautic image referenced is right here and is already in use with the deployment I want to scale.
REPOSITORY TAG IMAGE ID CREATED SIZE
testimage v0 0e0d4b13c0c2 10 days ago 611MB
mautic/mautic latest 730d2796f904 2 weeks ago 611MB
mysql 5.6 cdfa8cc50c33 5 weeks ago 298MB
mysql latest c73c7527c03a 5 weeks ago 412MB
gcr.io/google_containers/k8s-dns-sidecar-amd64 1.14.4 38bac66034a6 2 months ago 41.8MB
gcr.io/google_containers/k8s-dns-kube-dns-amd64 1.14.4 a8e00546bcf3 2 months ago 49.4MB
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 1.14.4 f7f45b9cb733 2 months ago 41.4MB
gcr.io/google-containers/kube-addon-manager v6.4-beta.2 0a951668696f 2 months ago 79.2MB
gcr.io/google_containers/kubernetes-dashboard-amd64 v1.6.1 71dfe833ce74 3 months ago 134MB
autoize/mautic latest 6c99d7ce1a07 4 months ago 665MB
gcr.io/google_containers/pause-amd64 3.0 99e59f495ffa 16 months ago 747kB
Pods
Does anyone have any idea why this isn't working?

Your host has lost connectivity to Docker Hub - try running "docker pull mautic/mautic:latest" and see if that works. This may be a network issue on your host, a problem with an intermediate proxy between your host and Docker Hub or (less likely) a temporary outage at Docker Hub or something else.
Since you're using the latest tag it is likely that your Deployment is using imagePullPolicy=Always. (docs: Defaults to Always if :latest tag is specified, or IfNotPresent otherwise.) I'd suggest explicitly setting imagePullPolicy=IfNotPresent in your Deployment spec so that the local image that already exists is used when starting a new container.

kubernetes local cluster create pods got errors like ‘ErrImagePull’ and ‘ImagePullBackOff’

I just installed a kubernetes local cluster， but when I tried the command
cluster/kubectl.sh run my-nginx --image=nginx --replicas=2 --port=80
to create and run pods, here is what I got:
NAME READY STATUS RESTARTS AGE
my-nginx-00t7f 0/1 ContainerCreating 0 23m
my-nginx-spy2b 0/1 ContainerCreating 0 23m
and I used kubectl logs, I got
Pod "my-nginx-00t7f" in namespace "default" : pod is not in 'Running', 'Succeeded' or 'Failed' state - State: "Pending"
Seems it got stuck in 'pending' status.
Then I used 'kubectl describe' and got
Name: my-nginx-00t7f
Namespace: default
Image(s): nginx
Node: 127.0.0.1/127.0.0.1
Start Time: Thu, 17 Dec 2015 22:27:18 +0800
Labels: run=my-nginx
Status: Pending
Reason:
Message:
IP:
Replication Controllers: my-nginx (2/2 replicas created)
Containers:
my-nginx:
Container ID:
Image: nginx
Image ID:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
default-token-p09p6:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-p09p6
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
26m 26m 1 {scheduler } Normal Scheduled Successfully assigned my-nginx-00t7f to 127.0.0.1
22m 1m 79 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: ImagePullBackOff
24m 5s 8 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: ErrImagePull
It seems my docker can not pull images, but actually it can, there is no problem when I docker pull nginx.

I assume that you figured out that it was the pause container that couldn't be pulled from the Kubelet logs.
Kubernetes needs to create a container for the pod in order to hold shared resources, such as the network namespace. It uses the pause container for this, which is a very small container that just sleeps forever.

If your container remains in pending status then please check the kube-schedular services. If its stopped state, turn it on and check.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

CockroachDB Cluster on Kubernetes Pods Crashing - kubernetes

When Pods get crashed, the most important thing to troubleshoot is their descriptions(kubectl describe) and logs. Logs of the failed Pod show that the arch of the cockroach image doesn't match to the nodes. Run kubectl get po -o wide to get nodes where cockroach runs and check their arch.

Related

kubernetes pod (mssql-tools) failing with CrashLoopBackOff error and restarting

Pod with Debian image is getting created but container is continuously crashing

How to keep my pod running (not get terminate/restart) in EKS

Error imagepullbackoff when scaling working deployment Kubernetes

kubernetes local cluster create pods got errors like ‘ErrImagePull’ and ‘ImagePullBackOff’

Categories

Resources