How to see Pod logs: a container name must be specified for pod... choose one of: [wait main] - kubernetes

I am running an Argo workflow and getting the following error in the pod's log:
error: a container name must be specified for pod <name>, choose one of: [wait main]
This error only happens some of the time and only with some of my templates, but when it does, it is a template that is run later in the workflow (i.e. not the first template run). I have not yet been able to identify the parameters that will run successfully, so I will be happy with tips for debugging. I have pasted the output of describe below.
Based on searches, I think the solution is simply that I need to attach "-c main" somewhere, but I do not know where and cannot find information in the Argo docs.
Describe:
Name: message-passing-1-q8jgn-607612432
Namespace: argo
Priority: 0
Node: REDACTED
Start Time: Wed, 17 Mar 2021 17:16:37 +0000
Labels: workflows.argoproj.io/completed=false
workflows.argoproj.io/workflow=message-passing-1-q8jgn
Annotations: cni.projectcalico.org/podIP: 192.168.40.140/32
cni.projectcalico.org/podIPs: 192.168.40.140/32
workflows.argoproj.io/node-name: message-passing-1-q8jgn.e
workflows.argoproj.io/outputs: {"exitCode":"6"}
workflows.argoproj.io/template:
{"name":"egress","arguments":{},"inputs":{...
Status: Failed
IP: 192.168.40.140
IPs:
IP: 192.168.40.140
Controlled By: Workflow/message-passing-1-q8jgn
Containers:
wait:
Container ID: docker://26d6c30440777add2af7ef3a55474d9ff36b8c562d7aecfb911ce62911e5fda3
Image: argoproj/argoexec:v2.12.10
Image ID: docker-pullable://argoproj/argoexec#sha256:6edb85a84d3e54881404d1113256a70fcc456ad49c6d168ab9dfc35e4d316a60
Port: <none>
Host Port: <none>
Command:
argoexec
wait
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 17 Mar 2021 17:16:43 +0000
Finished: Wed, 17 Mar 2021 17:17:03 +0000
Ready: False
Restart Count: 0
Environment:
ARGO_POD_NAME: message-passing-1-q8jgn-607612432 (v1:metadata.name)
Mounts:
/argo/podmetadata from podmetadata (rw)
/mainctrfs/mnt/logs from log-p1-vol (rw)
/mainctrfs/mnt/processed from processed-p1-vol (rw)
/var/run/docker.sock from docker-sock (ro)
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-v2w56 (ro)
main:
Container ID: docker://67e6d6d3717ab1080f14cac6655c90d990f95525edba639a2d2c7b3170a7576e
Image: REDACTED
Image ID: REDACTED
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
State: Terminated
Reason: Error
Exit Code: 6
Started: Wed, 17 Mar 2021 17:16:43 +0000
Finished: Wed, 17 Mar 2021 17:17:03 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt/logs/ from log-p1-vol (rw)
/mnt/processed/ from processed-p1-vol (rw)
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-v2w56 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
podmetadata:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
docker-sock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType: Socket
processed-p1-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: message-passing-1-q8jgn-processed-p1-vol
ReadOnly: false
log-p1-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: message-passing-1-q8jgn-log-p1-vol
ReadOnly: false
argo-token-v2w56:
Type: Secret (a volume populated by a Secret)
SecretName: argo-token-v2w56
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m35s default-scheduler Successfully assigned argo/message-passing-1-q8jgn-607612432 to ack1
Normal Pulled 7m31s kubelet Container image "argoproj/argoexec:v2.12.10" already present on machine
Normal Created 7m31s kubelet Created container wait
Normal Started 7m30s kubelet Started container wait
Normal Pulled 7m30s kubelet Container image already present on machine
Normal Created 7m30s kubelet Created container main
Normal Started 7m30s kubelet Started container main

This happens when you try to see logs for a pod with multiple containers and not specify for what container you want to see the log. Typical command to see logs:
kubectl logs <podname>
But your Pod has two container, one named "wait" and one named "main". You can see the logs from the container named "main" with:
kubectl logs <podname> -c main
or you can see the logs from all containers with
kubectl logs <podname> --all-containers

Related

ready 0/1 in bitnami/mongo

I am getting a 0/1 ready state when i run kubectl get pod -n mongodb
To install binami/mongo I ran:
helm repo add bitnami https://charts.bitnami.com/bitnami
kubectl create namespace mongodb
helm install mongodb -n mongodb bitnami/mongodb --values ./factory/yaml/mongodb/values.yaml
when I run kubectl get pod -n mongodb
NAME READY STATUS RESTARTS AGE
mongodb-79bf77f485-8bdm6 0/1 CrashLoopBackOff 69 (55s ago) 4h17m
Here I want the ready state to be 1/1 and status running
thenI ran kubectl describe pod -n mongodb to view the log, and I got
Name: mongodb-79bf77f485-8bdm6
Namespace: mongodb
Priority: 0
Node: ip-192-168-58-58.ec2.internal/192.168.58.58
Start Time: Tue, 19 Jul 2022 07:31:32 +0000
Labels: app.kubernetes.io/component=mongodb
app.kubernetes.io/instance=mongodb
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=mongodb
helm.sh/chart=mongodb-12.1.26
pod-template-hash=79bf77f485
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.47.246
IPs:
IP: 192.168.47.246
Controlled By: ReplicaSet/mongodb-79bf77f485
Containers:
mongodb:
Container ID: docker://11999c2e13e382ceb0a8ba2ea8255ed3d4dc07ca18659ee5a1fe1a8d071b10c0
Image: docker.io/bitnami/mongodb:4.4.2-debian-10-r0
Image ID: docker-pullable://bitnami/mongodb#sha256:add0ef947bc26d25b12ee1b01a914081e08b5e9242d2f9e34e2881b5583ce102
Port: 27017/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 19 Jul 2022 11:46:12 +0000
Finished: Tue, 19 Jul 2022 11:47:32 +0000
Ready: False
Restart Count: 69
Liveness: exec [/bitnami/scripts/ping-mongodb.sh] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [/bitnami/scripts/readiness-probe.sh] delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
BITNAMI_DEBUG: false
MONGODB_ROOT_USER: root
MONGODB_ROOT_PASSWORD: <set to the key 'mongodb-root-password' in secret 'mongodb'> Optional: false
ALLOW_EMPTY_PASSWORD: no
MONGODB_SYSTEM_LOG_VERBOSITY: 0
MONGODB_DISABLE_SYSTEM_LOG: no
MONGODB_DISABLE_JAVASCRIPT: no
MONGODB_ENABLE_JOURNAL: yes
MONGODB_PORT_NUMBER: 27017
MONGODB_ENABLE_IPV6: no
MONGODB_ENABLE_DIRECTORY_PER_DB: no
Mounts:
/bitnami/mongodb from datadir (rw)
/bitnami/scripts from common-scripts (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nsr69 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
common-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: mongodb-common-scripts
Optional: false
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongodb
ReadOnly: false
kube-api-access-nsr69:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 30m (x530 over 4h19m) kubelet Readiness probe failed: /bitnami/scripts/readiness-probe.sh: line 9: mongosh: command not found
Warning BackOff 9m59s (x760 over 4h10m) kubelet Back-off restarting failed container
Warning Unhealthy 5m6s (x415 over 4h19m) kubelet Liveness probe failed: /bitnami/scripts/ping-mongodb.sh: line 2: mongosh: command not found.
Don't understand the log or where the problem is coming from
How can I make the ready state 1/1 and a running status

Getting CrashBackloopError when deploying a pod

I am new to kubernetes and am trying to deploy a pod with private registry. Whenever I deploy this yaml it goes crash loop. Added sleep with a large value thinking that might cause this, still haven't worked.
apiVersion: v1
kind: Pod
metadata:
name: privetae-image-testing
spec:
containers:
- name: private-image-test
image: buildforjenkin.azurecr.io/nginx:latest
imagePullPolicy: IfNotPresent
command: ['echo','success','sleep 1000000']
Here are the logs:
Name: privetae-image-testing
Namespace: default
Priority: 0
Node: docker-desktop/192.168.65.4
Start Time: Sun, 24 Oct 2021 15:52:25 +0530
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.1.1.49
IPs:
IP: 10.1.1.49
Containers:
private-image-test:
Container ID: docker://46520936762f17b70d1ec92a121269e90aef2549390a14184e6c838e1e6bafec
Image: buildforjenkin.azurecr.io/nginx:latest
Image ID: docker-pullable://buildforjenkin.azurecr.io/nginx#sha256:7250923ba3543110040462388756ef099331822c6172a050b12c7a38361ea46f
Port: <none>
Host Port: <none>
Command:
echo
success
sleep 1000000
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 24 Oct 2021 15:52:42 +0530
Finished: Sun, 24 Oct 2021 15:52:42 +0530
Ready: False
Restart Count: 2
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ld6zz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-ld6zz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 34s default-scheduler Successfully assigned default/privetae-image-testing to docker-desktop
Normal Pulled 17s (x3 over 33s) kubelet Container image "buildforjenkin.azurecr.io/nginx:latest" already present on machine
Normal Created 17s (x3 over 33s) kubelet Created container private-image-test
Normal Started 17s (x3 over 33s) kubelet Started container private-image-test
Warning BackOff 2s (x5 over 31s) kubelet Back-off restarting failed container
I am running the cluster on docker-desktop on windows. TIA
Notice you are using standard nginx image? Try delete your pod and re-apply with:
apiVersion: v1
kind: Pod
metadata:
name: private-image-testing
labels:
run: my-nginx
spec:
restartPolicy: Always
containers:
- name: private-image-test
image: buildforjenkin.azurecr.io/nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: http
If your pod runs, you should be able to remote into with kubectl exec -it private-image-testing -- sh, follow by wget -O- localhost should print you a welcome message. If it still fail, paste the output of kubectl logs -f -l run=my-nginx to your question.
Check my previous answer to understand step-by step whats going on after you launch the container.
You are launching some nginx:latest container with the process inside that runs forever as it should be to avoid main process be exited. Then you add overlay that (I will quote David: print the words success and sleep 1000000, and having printed those words, then exit).
Instead of making your container running all the time to serve, you explicitly shooting into your leg by finishing the process using sleep 1000000.
And sure, your command will be executed and container will exit. Check that. It was exited correctly with status 0 and did that 2 times. And will more in the future.
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 24 Oct 2021 15:52:42 +0530
Finished: Sun, 24 Oct 2021 15:52:42 +0530
You need to think well if you really need command: ['echo','success','sleep 1000000']

what is ` i/o timeout` issue in kubernete pods?

I deployed a kubernete cluster with two EC2 instances.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-21-12 Ready control-plane,master 36h v1.20.2
ip-172-31-21-62 Ready <none> 12h v1.20.2
There is an error pod in the cluster:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
es-cluster-0 0/1 Error 141 12h
I see this error in the log of this pod:
$ kubectl logs pods/es-cluster-0
Error from server: Get "https://172.31.21.62:10250/containerLogs/default/es-cluster-0/elasticsearch": dial tcp 172.31.21.62:10250: i/o timeout
It is talking about having error with the node 172.31.21.62:10250. how can I fix the issue? I am not sure that this error mean?
Below command is from master node:
kubectl describe pods/es-cluster-0 command output:
$ kubectl describe pods/es-cluster-0
Name: es-cluster-0
Namespace: default
Priority: 0
Node: ip-172-31-21-62/172.31.21.62
Start Time: Wed, 17 Feb 2021 10:08:04 +0000
Labels: controller-revision-hash=es-cluster-55b6944c56
name=elasticsearch
statefulset.kubernetes.io/pod-name=es-cluster-0
Annotations: <none>
Status: Running
IP: 10.32.0.2
IPs:
IP: 10.32.0.2
Controlled By: StatefulSet/es-cluster
Containers:
elasticsearch:
Container ID: docker://838e02ff6fba31234656e68b804f49e86ec7fea0053e5d1062abdd9d24b728b9
Image: elasticsearch:7.10.1
Image ID: docker-pullable://elasticsearch#sha256:7cd88158f6ac75d43b447fdd98c4eb69483fa7bf1be5616a85fe556262dc864a
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 78
Started: Thu, 18 Feb 2021 01:40:23 +0000
Finished: Thu, 18 Feb 2021 01:40:39 +0000
Ready: False
Restart Count: 178
Environment: <none>
Mounts:
/usr/share/elasticsearch/config/elasticsearch.yml from elasticsearch-config (rw,path="elasticsearch.yml")
/var/run/secrets/kubernetes.io/serviceaccount from default-token-ntgwr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
elasticsearch-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: es-config
Optional: false
default-token-ntgwr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-ntgwr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 42s (x4082 over 15h) kubelet Back-off restarting failed container
Below command is from master node:
$ netstat -tunlp| grep 10250
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 :::10250 :::* LISTEN -
Below command is from worker node:
$ kubectl describe pods/es-cluster-0
Name: es-cluster-0
Namespace: default
Priority: 0
Node: ip-172-31-21-62/172.31.21.62
Start Time: Wed, 17 Feb 2021 10:08:04 +0000
Labels: controller-revision-hash=es-cluster-55b6944c56
name=elasticsearch
statefulset.kubernetes.io/pod-name=es-cluster-0
Annotations: <none>
Status: Running
IP: 10.32.0.2
IPs:
IP: 10.32.0.2
Controlled By: StatefulSet/es-cluster
Containers:
elasticsearch:
Container ID: docker://0efa71da7b70725b248cfe8dfae5560c2fc95aaf40e3a3a899970e8579e7ac27
Image: elasticsearch:7.10.1
Image ID: docker-pullable://elasticsearch#sha256:7cd88158f6ac75d43b447fdd98c4eb69483fa7bf1be5616a85fe556262dc864a
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 78
Started: Thu, 18 Feb 2021 05:31:55 +0000
Finished: Thu, 18 Feb 2021 05:32:12 +0000
Ready: False
Restart Count: 221
Environment: <none>
Mounts:
/usr/share/elasticsearch/config/elasticsearch.yml from elasticsearch-config (rw,path="elasticsearch.yml")
/var/run/secrets/kubernetes.io/serviceaccount from default-token-ntgwr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
elasticsearch-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: es-config
Optional: false
default-token-ntgwr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-ntgwr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 55m (x211 over 19h) kubelet Container image "elasticsearch:7.10.1" already present on machine
Warning BackOff 54s (x5087 over 19h) kubelet Back-off restarting failed container
Below command is from worker node:
$ netstat -tunlp| grep 10250
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 :::10250 :::* LISTEN -

Kubernetes master doesn't attach FlexVolume

I'm trying to attach the dummy-attachable FlexVolume sample for Kubernetes which seems to initialize normally according to my logs on both the nodes and master:
Loaded volume plugin "flexvolume-k8s/dummy-attachable
But when I try to attach the volume to a pod, the attach method never gets called from the master. The logs from the node read:
flexVolume driver k8s/dummy-attachable: using default GetVolumeName for volume dummy-attachable
operationExecutor.VerifyControllerAttachedVolume started for volume "dummy-attachable"
Operation for "\"flexvolume-k8s/dummy-attachable/dummy-attachable\"" failed. No retries permitted until 2019-04-22 13:42:51.21390334 +0000 UTC m=+4814.674525788 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"dummy-attachable\" (UniqueName: \"flexvolume-k8s/dummy-attachable/dummy-attachable\") pod \"nginx-dummy-attachable\"
Here's how I'm attempting to mount the volume:
apiVersion: v1
kind: Pod
metadata:
name: nginx-dummy-attachable
namespace: default
spec:
containers:
- name: nginx-dummy-attachable
image: nginx
volumeMounts:
- name: dummy-attachable
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: dummy-attachable
flexVolume:
driver: "k8s/dummy-attachable"
Here is the ouput of kubectl describe pod nginx-dummy-attachable:
Name: nginx-dummy-attachable
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: [node id]
Start Time: Wed, 24 Apr 2019 08:03:21 -0400
Labels: <none>
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container nginx-dummy-attachable
Status: Pending
IP:
Containers:
nginx-dummy-attachable:
Container ID:
Image: nginx
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/data from dummy-attachable (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hcnhj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
dummy-attachable:
Type: FlexVolume (a generic volume resource that is provisioned/attached using an exec based plugin)
Driver: k8s/dummy-attachable
FSType:
SecretRef: nil
ReadOnly: false
Options: map[]
default-token-hcnhj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hcnhj
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 41s (x6 over 11m) kubelet, [node id] Unable to mount volumes for pod "nginx-dummy-attachable_default([id])": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx-dummy-attachable". list of unmounted volumes=[dummy-attachable]. list of unattached volumes=[dummy-attachable default-token-hcnhj]
I added debug logging to the FlexVolume, so I was able to verify that the attach method was never called on the master node. I'm not sure what I'm missing here.
I don't know if this matters, but the cluster is being launched with KOPS. I've tried with both k8s 1.11 and 1.14 with no success.
So this is a fun one.
Even though kubelet initializes the FlexVolume plugin on master, kube-controller-manager, which is containerized in KOPs, is the application that's actually responsible for attaching the volume to the pod. KOPs doesn't mount the default plugin directory /usr/libexec/kubernetes/kubelet-plugins/volume/exec into the kube-controller-manager pod, so it doesn't know anything about your FlexVolume plugins on master.
There doesn't appear to be a non-hacky way to do this other than to use a different Kubernetes deployment tool until KOPs addresses this problem.

Kubernetes init containers run every hour

I have recently set up redis via https://github.com/tarosky/k8s-redis-ha, this repo includes an init container, and I have included an extra init container in order to get passwords etc set up.
I am seeing some strange (and it seems undocumented) behavior, whereby the init containers run as expected before the redis container starts, however then they run subsequently every hour, close to an hour. I have tested this behavior using a busybox init container (which does nothing) on deployments & statefulset and experience the same behavior, so it is not specific to this redis pod.
I have tested this on bare metal with k8s 1.6 and 1.8 with the same results, however when applying init containers to GKE (k8s 1.7) this behavior does not happen. I can't see any flags for GKE's kubelet to dictate this behavior.
See below for kubectl describe pod showing that the init containers are run when the main pod has not exited/crashed.
Name: redis-sentinel-1
Namespace: (redacted)
Node: (redacted)/(redacted)
Start Time: Mon, 12 Mar 2018 06:20:55 +0000
Labels: app=redis-sentinel
controller-revision-hash=redis-sentinel-7cc557cf7c
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"StatefulSet","namespace":"(redacted)","name":"redis-sentinel","uid":"759a3a3b-25bd-11e8-a8ce-0242ac110...
security.alpha.kubernetes.io/unsafe-sysctls=net.core.somaxconn=1024
Status: Running
IP: (redacted)
Controllers: StatefulSet/redis-sentinel
Init Containers:
redis-ha-server:
Container ID: docker://557d777a7c660b062662426ebe9bbf6f9725fb9d88f89615a8881346587c1835
Image: tarosky/k8s-redis-ha:sentinel-3.0.1
Image ID: docker-pullable://tarosky/k8s-redis-ha#sha256:98e09ef5fbea5bfd2eb1858775c967fa86a92df48e2ec5d0b405f7ca3f5ada1c
Port:
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 13 Mar 2018 03:01:12 +0000
Finished: Tue, 13 Mar 2018 03:01:12 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/opt from opt (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hkj6d (ro)
-redis-init:
Container ID: docker://18c4e353233a6827999ae4a16adf1f408754a21d80a8e3374750fdf9b54f9b1a
Image: gcr.io/(redacted)/redis-init
Image ID: docker-pullable://gcr.io/(redacted)/redis-init#sha256:42042093d58aa597cce4397148a2f1c7967db689256ed4cc8d9f42b34d53aca2
Port:
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 13 Mar 2018 03:01:25 +0000
Finished: Tue, 13 Mar 2018 03:01:25 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/opt from opt (rw)
/secrets/redis-password from redis-password (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hkj6d (ro)
Containers:
redis-sentinel:
Container ID: docker://a54048cbb7ec535c841022c543a0d566c9327f37ede3a6232516721f0e37404d
Image: redis:3.2
Image ID: docker-pullable://redis#sha256:474fb41b08bcebc933c6337a7db1dc7131380ee29b7a1b64a7ab71dad03ad718
Port: 26379/TCP
Command:
/opt/bin/k8s-redis-ha-sentinel
Args:
/opt/sentinel.conf
State: Running
Started: Mon, 12 Mar 2018 06:21:02 +0000
Ready: True
Restart Count: 0
Readiness: exec [redis-cli -p 26379 info server] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SERVICE: redis-server
SERVICE_PORT: redis-server
Mounts:
/opt from opt (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hkj6d (ro)
redis-sword:
Container ID: docker://50279448bbbf175b6f56f96dab59061c4652c2117452ed15b3a5380681c7176f
Image: tarosky/k8s-redis-ha:sword-3.0.1
Image ID: docker-pullable://tarosky/k8s-redis-ha#sha256:2315c7a47d9e47043d030da270c9a1252c2cfe29c6e381c8f50ca41d3065db6d
Port:
State: Running
Started: Mon, 12 Mar 2018 06:21:03 +0000
Ready: True
Restart Count: 0
Environment:
SERVICE: redis-server
SERVICE_PORT: redis-server
SENTINEL: redis-sentinel
SENTINEL_PORT: redis-sentinel
Mounts:
/opt from opt (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hkj6d (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
opt:
Type: HostPath (bare host directory volume)
Path: /store/redis-sentinel/opt
redis-password:
Type: Secret (a volume populated by a Secret)
SecretName: redis-password
Optional: false
default-token-hkj6d:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hkj6d
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
20h 30m 21 kubelet, 10.1.3.102 spec.initContainers{redis-ha-server} Normal Pulling pulling image "tarosky/k8s-redis-ha:sentinel-3.0.1"
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-ha-server} Normal Started Started container
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-ha-server} Normal Created Created container
20h 30m 21 kubelet, 10.1.3.102 spec.initContainers{redis-ha-server} Normal Pulled Successfully pulled image "tarosky/k8s-redis-ha:sentinel-3.0.1"
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-init} Normal Pulling pulling image "gcr.io/(redacted)/redis-init"
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-init} Normal Pulled Successfully pulled image "gcr.io/(redacted)/redis-init"
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-init} Normal Created Created container
21h 30m 22 kubelet, 10.1.3.102 spec.initContainers{redis-init} Normal Started Started container
Note the Containers in the pod which started at Mon, 12 Mar 2018 06:21:02 +0000 (with 0 restarts) and the Init Containers which started from Tue, 13 Mar 2018 03:01:12 +0000. These seem to re-run every hour pretty much in an interval of hour.
Our bare metal must be misconfigured for init containers somewhere? Can anyone shed any light on this strange behavior?
If you are pruning away exited containers, then the container pruning/removal is a likely cause. In my testing, it appears that exited init containers which are removed from Docker Engine (hourly, or otherwise), such as with "docker system prune -f" will cause Kubernetes to re-launch the init containers. Is this the issue in your case, if this is still persisting?
Also, see https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/ for Kubelet garbage collection documentation, which appears to support these types of tasks (rather than needing to implement it yourself)