Getting CrashBackloopError when deploying a pod - kubernetes

I am new to kubernetes and am trying to deploy a pod with private registry. Whenever I deploy this yaml it goes crash loop. Added sleep with a large value thinking that might cause this, still haven't worked.
apiVersion: v1
kind: Pod
metadata:
name: privetae-image-testing
spec:
containers:
- name: private-image-test
image: buildforjenkin.azurecr.io/nginx:latest
imagePullPolicy: IfNotPresent
command: ['echo','success','sleep 1000000']
Here are the logs:
Name: privetae-image-testing
Namespace: default
Priority: 0
Node: docker-desktop/192.168.65.4
Start Time: Sun, 24 Oct 2021 15:52:25 +0530
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.1.1.49
IPs:
IP: 10.1.1.49
Containers:
private-image-test:
Container ID: docker://46520936762f17b70d1ec92a121269e90aef2549390a14184e6c838e1e6bafec
Image: buildforjenkin.azurecr.io/nginx:latest
Image ID: docker-pullable://buildforjenkin.azurecr.io/nginx#sha256:7250923ba3543110040462388756ef099331822c6172a050b12c7a38361ea46f
Port: <none>
Host Port: <none>
Command:
echo
success
sleep 1000000
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 24 Oct 2021 15:52:42 +0530
Finished: Sun, 24 Oct 2021 15:52:42 +0530
Ready: False
Restart Count: 2
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ld6zz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-ld6zz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 34s default-scheduler Successfully assigned default/privetae-image-testing to docker-desktop
Normal Pulled 17s (x3 over 33s) kubelet Container image "buildforjenkin.azurecr.io/nginx:latest" already present on machine
Normal Created 17s (x3 over 33s) kubelet Created container private-image-test
Normal Started 17s (x3 over 33s) kubelet Started container private-image-test
Warning BackOff 2s (x5 over 31s) kubelet Back-off restarting failed container
I am running the cluster on docker-desktop on windows. TIA

Notice you are using standard nginx image? Try delete your pod and re-apply with:
apiVersion: v1
kind: Pod
metadata:
name: private-image-testing
labels:
run: my-nginx
spec:
restartPolicy: Always
containers:
- name: private-image-test
image: buildforjenkin.azurecr.io/nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: http
If your pod runs, you should be able to remote into with kubectl exec -it private-image-testing -- sh, follow by wget -O- localhost should print you a welcome message. If it still fail, paste the output of kubectl logs -f -l run=my-nginx to your question.

Check my previous answer to understand step-by step whats going on after you launch the container.
You are launching some nginx:latest container with the process inside that runs forever as it should be to avoid main process be exited. Then you add overlay that (I will quote David: print the words success and sleep 1000000, and having printed those words, then exit).
Instead of making your container running all the time to serve, you explicitly shooting into your leg by finishing the process using sleep 1000000.
And sure, your command will be executed and container will exit. Check that. It was exited correctly with status 0 and did that 2 times. And will more in the future.
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 24 Oct 2021 15:52:42 +0530
Finished: Sun, 24 Oct 2021 15:52:42 +0530
You need to think well if you really need command: ['echo','success','sleep 1000000']

Related

CrashLoopBackOff : Back-off restarting failed container for flask application

I am a beginner in kubernetes and was trying to deploy my flask application following this guide: https://medium.com/analytics-vidhya/build-a-python-flask-app-and-deploy-with-kubernetes-ccc99bbec5dc
I have successfully built a docker image and pushed it to dockerhub https://hub.docker.com/repository/docker/beatrix1997/kubernetes_flask_app
but am having trouble debugging a pod.
This is my yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubernetesflaskapp-deploy
labels:
app: kubernetesflaskapp
spec:
replicas: 1
selector:
matchLabels:
app: kubernetesflaskapp
template:
metadata:
labels:
app: kubernetesflaskapp
spec:
containers:
- name: kubernetesflaskapp
image: beatrix1997/kubernetes_flask_app
ports:
- containerPort: 5000
And this is the description of the pod:
Name: kubernetesflaskapp-deploy-5764bbbd44-8696k
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Fri, 20 May 2022 11:26:33 +0100
Labels: app=kubernetesflaskapp
pod-template-hash=5764bbbd44
Annotations: <none>
Status: Running
IP: 172.17.0.12
IPs:
IP: 172.17.0.12
Controlled By: ReplicaSet/kubernetesflaskapp-deploy-5764bbbd44
Containers:
kubernetesflaskapp:
Container ID: docker://d500dc15e389190670a9273fea1d70e6bd6ab2e7053bd2480d114ad6150830f1
Image: beatrix1997/kubernetes_flask_app
Image ID: docker-pullable://beatrix1997/kubernetes_flask_app#sha256:1bfa98229f55b04f32a6b85d72860886abcc0f17295b14e173151a8e4b0f0334
Port: 5000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 20 May 2022 11:58:38 +0100
Finished: Fri, 20 May 2022 11:58:38 +0100
Ready: False
Restart Count: 11
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zq8n7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-zq8n7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 33m default-scheduler Successfully assigned default/kubernetesflaskapp-deploy-5764bbbd44-8696k to minikube
Normal Pulled 33m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 14.783413947s
Normal Pulled 33m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.243534487s
Normal Pulled 32m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.373217701s
Normal Pulling 32m (x4 over 33m) kubelet Pulling image "beatrix1997/kubernetes_flask_app"
Normal Created 32m (x4 over 33m) kubelet Created container kubernetesflaskapp
Normal Pulled 32m kubelet Successfully pulled image "beatrix1997/kubernetes_flask_app" in 1.239794774s
Normal Started 32m (x4 over 33m) kubelet Started container kubernetesflaskapp
Warning BackOff 3m16s (x138 over 33m) kubelet Back-off restarting failed container
I am using ubuntu as my OS if it matters at all.
Any help would be appreciated!
Many thanks!
I would check the following:
Check if your Docker image works in Docker, you can run it with the run command, find the official doc here
If it doesn't work, then you can check what is wrong in your app first.
If it does, try checking the readiness and liveness probe, here the official documentation
You can find more hints about failing pods here
The error can be due to the issue in the application as the reported reason is "Back-off restarting failed container". Please paste the following logs in the question for further clarification
kubectl logs -n <NS> pods <pod-name>

kubernetes: when pod in CrashLoopBackOff status, related events won't update?

I'm testing kubernetes behavior when pod getting error.
I now have a pod in CrashLoopBackOff status caused by liveness probe failed, from what I can see in kubernetes events, pod turns into CrashLoopBackOff after 3 times try and begin to back off restarting, but the related Liveness probe failed events won't update?
➜ ~ kubectl describe pods/my-nginx-liveness-err-59fb55cf4d-c6p8l
Name: my-nginx-liveness-err-59fb55cf4d-c6p8l
Namespace: default
Priority: 0
Node: minikube/192.168.99.100
Start Time: Thu, 15 Jul 2021 12:29:16 +0800
Labels: pod-template-hash=59fb55cf4d
run=my-nginx-liveness-err
Annotations: <none>
Status: Running
IP: 172.17.0.3
IPs:
IP: 172.17.0.3
Controlled By: ReplicaSet/my-nginx-liveness-err-59fb55cf4d
Containers:
my-nginx-liveness-err:
Container ID: docker://edc363b76811fdb1ccacdc553d8de77e9d7455bb0d0fb3cff43eafcd12ee8a92
Image: nginx
Image ID: docker-pullable://nginx#sha256:353c20f74d9b6aee359f30e8e4f69c3d7eaea2f610681c4a95849a2fd7c497f9
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 15 Jul 2021 13:01:36 +0800
Finished: Thu, 15 Jul 2021 13:02:06 +0800
Ready: False
Restart Count: 15
Liveness: http-get http://:8080/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-r7mh4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-r7mh4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37m default-scheduler Successfully assigned default/my-nginx-liveness-err-59fb55cf4d-c6p8l to minikube
Normal Created 35m (x4 over 37m) kubelet Created container my-nginx-liveness-err
Normal Started 35m (x4 over 37m) kubelet Started container my-nginx-liveness-err
Normal Killing 35m (x3 over 36m) kubelet Container my-nginx-liveness-err failed liveness probe, will be restarted
Normal Pulled 31m (x7 over 37m) kubelet Container image "nginx" already present on machine
Warning Unhealthy 16m (x32 over 36m) kubelet Liveness probe failed: Get "http://172.17.0.3:8080/": dial tcp 172.17.0.3:8080: connect: connection refused
Warning BackOff 118s (x134 over 34m) kubelet Back-off restarting failed container
BackOff event updated 118s ago, but Unhealthy event updated 16m ago?
and why I'm getting only 15 times Restart Count while BackOff events with 134 times?
I'm using minikube and my deployment is like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-liveness-err
spec:
selector:
matchLabels:
run: my-nginx-liveness-err
replicas: 1
template:
metadata:
labels:
run: my-nginx-liveness-err
spec:
containers:
- name: my-nginx-liveness-err
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 8080
I think you might be confusing Status Conditions and Events. Events don't "update", they just exist. It's a stream of event data from the controllers for debugging or alerting on. The Age column is the relative timestamp to the most recent instance of that event type and you can see if does some basic de-duplication. Events also age out after a few hours to keep the database from exploding.
So your issue has nothing to do with the liveness probe, your container is crashing on startup.

Kubernetes CronJob Not exited

I am running a cronjob in kubernetes. Cronjob started and but not exited. Status of pod is always in RUNNING.
Below is logs
kubectl get pods
cronjob-1623253800-xnwwx 1/1 Running 0 13h
When i describe the JOB below are noticed
kubectl describe job cronjob-1623300120
Name: cronjob-1623300120
Namespace: cronjob
Selector: xxxxx
Labels: xxxxx
Annotations: <none>
Controlled By: CronJob/cronjob
Parallelism: 1
Completions: 1
Start Time: Thu, 9 Jun 2021 10:12:03 +0530
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=cronjob
controller-xxxx
job-name=cronjob-1623300120
Containers:
plannercronjob:
Image: xxxxxxxxxxxxx
Port: <none>
Host Port: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 13h job-controller Created pod: cronjob-1623300120
I Noticed that Pods Statuses: 1 Running / 0 Succeeded / 0 Failed. This means that the when code return zero , then job Succeeded/Failed. Is that correct ?.
When i enter into the pod using execute command
kubectl exec --stdin --tty cronjob-1623253800-xnwwx -n cronjob -- /bin/bash
root#cronjob-1623253800-xnwwx:/# ps ax| grep python
1 ? Ssl 0:01 python -m sfit.src.app
18 pts/0 S+ 0:00 grep python
I found that python process is still running. Is this a code issue deadlock or something else.
pod describe
Name: cronjob-1623302220-xnwwx
Namespace: default
Priority: 0
Node: aks-agentpool-xxxxvmss000000/10.240.0.4
Start Time: Thu, 9 Jun 2021 10:47:02 +0530
Labels: app=cronjob
controller-uid=xxxxxx
job-name=cronjob-1623302220
Annotations: <none>
Status: Running
IP: 10.244.1.30
IPs:
IP: 10.244.1.30
Controlled By: Job/cronjob-1623302220
Containers:
plannercronjob:
Container ID: docker://xxxxxxxxxxxxxxxx
Image: xxxxxxxxxxx
Image ID: docker-xxxx
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 9 Jun 2021 10:47:06 +0530
Ready: True
Restart Count: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-97xzv (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-97xzv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-97xzv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13h default-scheduler Successfully assigned cronjob/cronjob-1623302220-xnwwx to aks-agentpool-xxx-vmss000000
Normal Pulling 13h kubelet, aks-agentpool-xxx-vmss000000 Pulling image "xxxx.azurecr.io/xxx:1.1.1"
Normal Pulled 13h kubelet, aks-agentpool-xxx-vmss000000 Successfully pulled image "xxx.azurecr.io/xx:1.1.1"
Normal Created 13h kubelet, aks-agentpool-xxx-vmss000000 Created container cronjob
Normal Started 13h kubelet, aks-agentpool-xxx-vmss000000 Started container cronjob
#KrishnaChaurasia . I run the docker image in my system. There is some error in my python code. But it is exit with error. But in the kubernetes it is not exited and not stop
docker run xxxxx/cronjob:1
File "/usr/local/lib/python3.8/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 261, in send
raise error
azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x7f113f6480a0>: Failed to establish a new connection: [Errno -2] Name or service not known
echo $?
1
If you are seeing your pod is always running and never completed, try to add staratingDeadlineSeconds.
https://medium.com/#hengfeng/what-does-kubernetes-cronjobs-startingdeadlineseconds-exactly-mean-cc2117f9795f

How to see Pod logs: a container name must be specified for pod... choose one of: [wait main]

I am running an Argo workflow and getting the following error in the pod's log:
error: a container name must be specified for pod <name>, choose one of: [wait main]
This error only happens some of the time and only with some of my templates, but when it does, it is a template that is run later in the workflow (i.e. not the first template run). I have not yet been able to identify the parameters that will run successfully, so I will be happy with tips for debugging. I have pasted the output of describe below.
Based on searches, I think the solution is simply that I need to attach "-c main" somewhere, but I do not know where and cannot find information in the Argo docs.
Describe:
Name: message-passing-1-q8jgn-607612432
Namespace: argo
Priority: 0
Node: REDACTED
Start Time: Wed, 17 Mar 2021 17:16:37 +0000
Labels: workflows.argoproj.io/completed=false
workflows.argoproj.io/workflow=message-passing-1-q8jgn
Annotations: cni.projectcalico.org/podIP: 192.168.40.140/32
cni.projectcalico.org/podIPs: 192.168.40.140/32
workflows.argoproj.io/node-name: message-passing-1-q8jgn.e
workflows.argoproj.io/outputs: {"exitCode":"6"}
workflows.argoproj.io/template:
{"name":"egress","arguments":{},"inputs":{...
Status: Failed
IP: 192.168.40.140
IPs:
IP: 192.168.40.140
Controlled By: Workflow/message-passing-1-q8jgn
Containers:
wait:
Container ID: docker://26d6c30440777add2af7ef3a55474d9ff36b8c562d7aecfb911ce62911e5fda3
Image: argoproj/argoexec:v2.12.10
Image ID: docker-pullable://argoproj/argoexec#sha256:6edb85a84d3e54881404d1113256a70fcc456ad49c6d168ab9dfc35e4d316a60
Port: <none>
Host Port: <none>
Command:
argoexec
wait
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 17 Mar 2021 17:16:43 +0000
Finished: Wed, 17 Mar 2021 17:17:03 +0000
Ready: False
Restart Count: 0
Environment:
ARGO_POD_NAME: message-passing-1-q8jgn-607612432 (v1:metadata.name)
Mounts:
/argo/podmetadata from podmetadata (rw)
/mainctrfs/mnt/logs from log-p1-vol (rw)
/mainctrfs/mnt/processed from processed-p1-vol (rw)
/var/run/docker.sock from docker-sock (ro)
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-v2w56 (ro)
main:
Container ID: docker://67e6d6d3717ab1080f14cac6655c90d990f95525edba639a2d2c7b3170a7576e
Image: REDACTED
Image ID: REDACTED
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
State: Terminated
Reason: Error
Exit Code: 6
Started: Wed, 17 Mar 2021 17:16:43 +0000
Finished: Wed, 17 Mar 2021 17:17:03 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt/logs/ from log-p1-vol (rw)
/mnt/processed/ from processed-p1-vol (rw)
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-v2w56 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
podmetadata:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
docker-sock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType: Socket
processed-p1-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: message-passing-1-q8jgn-processed-p1-vol
ReadOnly: false
log-p1-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: message-passing-1-q8jgn-log-p1-vol
ReadOnly: false
argo-token-v2w56:
Type: Secret (a volume populated by a Secret)
SecretName: argo-token-v2w56
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m35s default-scheduler Successfully assigned argo/message-passing-1-q8jgn-607612432 to ack1
Normal Pulled 7m31s kubelet Container image "argoproj/argoexec:v2.12.10" already present on machine
Normal Created 7m31s kubelet Created container wait
Normal Started 7m30s kubelet Started container wait
Normal Pulled 7m30s kubelet Container image already present on machine
Normal Created 7m30s kubelet Created container main
Normal Started 7m30s kubelet Started container main
This happens when you try to see logs for a pod with multiple containers and not specify for what container you want to see the log. Typical command to see logs:
kubectl logs <podname>
But your Pod has two container, one named "wait" and one named "main". You can see the logs from the container named "main" with:
kubectl logs <podname> -c main
or you can see the logs from all containers with
kubectl logs <podname> --all-containers

How to keep my pod running (not get terminate/restart) in EKS

I am trying to use AWS EKS (fargate) to run automation case, but some pods (9 out of 10 times running) get terminated, makes automation failure.
I have a bunch of automation cases written in robotframework, the case itself is running well, but it is time-consumming, usually need 6 hours for a round. So, I think I can use K8S to run the cases in-parallel, therefore save time, I use Jenkins to config how many 'automations' run in-parallel, and after all done, merge and present the test result.
But some pods aften get terminated,
command "kubectl get pod", return something like this (I set the "restartPolicy: Never" to keep the error pod to 'describe' its info, otherwise, the pod just gone)
box6 0/1 Error 0 9m39s
command "kubectl describe pod box6" get output like following (masked some private information).
Name: box6
Namespace: default
Priority: 2000001000
Priority Class Name: system-node-critical
Node: XXXXXXXX
Start Time: Mon, 21 Dec 2020 15:29:37 +0800
Labels: eks.amazonaws.com/fargate-profile=eksautomation-profile
name=box6
purpose=demonstrate-command
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"name":"box6","purpose":"demonstrate-command"},"name":"box6","names...
kubernetes.io/psp: eks.privileged
Status: Failed
IP: 192.168.183.226
IPs:
IP: 192.168.183.226
Containers:
box6:
Container ID: XXXXXXXX
Image: XXXXXXXX
Image ID: XXXXXXXX
Port: <none>
Host Port: <none>
Command:
/bin/initMock.sh
State: Terminated
Reason: Error
Exit Code: 143
Started: Mon, 21 Dec 2020 15:32:12 +0800
Finished: Mon, 21 Dec 2020 15:34:09 +0800
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tsk7j (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-tsk7j:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tsk7j
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled <unknown> fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled <unknown> fargate-scheduler Successfully assigned default/box6 to XXXXXXXX
Normal Pulling 5m7s kubelet, XXXXXXXX Pulling image "XXXXXXXX"
Normal Pulled 2m38s kubelet, XXXXXXXX Successfully pulled image "XXXXXXXX"
Normal Created 2m38s kubelet, XXXXXXXX Created container box6
Normal Started 2m37s kubelet, XXXXXXXX Started container box6
I did some search upon the error, error code 143 is 128+SIGTERM, I am doubting that the pod get killed by EKS intentionally.
I cannot config the pod to have it restart, because if so, the automation case cannot resume, therefore makes the effort useless (not able to save automation running time).
I have tried to enable cloud watch, hoping to get a clue upon why pod get terminated, but no clue.
Why my pod get terminated by EKS? how should I troubleshooting upon it? how should I avoid it?
Thanks for your help.