Kubernetes job pod completed successfully but one of the containers were not ready - kubernetes

I've got some strange looking behavior.
When a job is run, it completes successfully but one of the containers says it's not (or was not..) ready:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default **********-migration-22-20-16-29-11-2018-xnffp 1/2 Completed 0 11h 10.4.5.8 gke-******
job yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: migration-${timestamp_hhmmssddmmyy}
labels:
jobType: database-migration
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
containers:
- name: app
image: "${appApiImage}"
imagePullPolicy: IfNotPresent
command:
- php
- artisan
- migrate
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=${SQL_INSTANCE_NAME}=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
securityContext:
runAsUser: 2 # non-root user
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
What may be the cause of this behavior? There is no readiness or liveness probes defined on the containers.
If I do a describe on the pod, the relevant info is:
...
Command:
php
artisan
migrate
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 29 Nov 2018 22:20:18 +0000
Finished: Thu, 29 Nov 2018 22:20:19 +0000
Ready: False
Restart Count: 0
Requests:
cpu: 100m
...

A Pod with a Ready status means it "is able to serve requests and should be added to the load balancing pools of all matching Services", see https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions
In your case, you don't want to serve requests, but simply to execute php artisan migrate once, and done. So you don't have to worry about this status, the important part is the State: Terminated with a Reason: Completed and a zero exit code: your command did whatever and then exited successfully.
If the result of the command is not what you expected, you'd have to investigate the logs from the container that ran this command with kubectl logs your-pod -c app (where app is the name of the container you defined), and/or you would expect the php artisan migrate command to NOT issue a zero exit code.

In my case, I was using istio, and experienced the same issue, removing istio-sidecar from the job pod solves this problem.
my solution if using istio:
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"

Related

Mount camera to pod get MountVolume.SetUp failed for volume "default-token-c8hm5" : failed to sync secret cache: timed out waiting for the condition

On my Jetson NX, I like to set a yaml file that can mount 2 cameras to pod,
the yaml:
containers:
- name: my-pod
image: my_image:v1.0.0
imagePullPolicy: Always
volumeMounts:
- mountPath: /dev/video0
name: dev-video0
- mountPath: /dev/video1
name: dev-video1
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 9000
command: [ "/bin/bash"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
securityContext:
privileged: true
volumes:
- hostPath:
path: /dev/video0
type: ""
name: dev-video0
- hostPath:
path: /dev/video1
type: ""
name: dev-video1
but when I deploy it as pod, get the error:
MountVolume.SetUp failed for volume "default-token-c8hm5" : failed to sync secret cache: timed out waiting for the condition
I had tried to remove volumes in yaml, and the pod can be successfully deployed. Any comments on this issue?
Another issue is that when there is a pod got some issues, it will consume the rest of my storage of my Jetson NX, I guess maybe k8s will make lots of temporary files or logs...? when something wrong happening, any solution to this issue, otherwise all od my pods will be evicted...

create an empty file inside a volume in Kubernetes pod

I have a legacy app which keep checking an empty file inside a directory and perform certain action if the file timestamp is changed.
I am migrating this app to Kubernetes so I want to create an empty file inside the pod. I tried subpath like below but it doesn't create any file.
apiVersion: v1
kind: Pod
metadata:
name: demo-pod
spec:
containers:
- name: demo
image: alpine
command: ["sleep", "3600"]
volumeMounts:
- name: volume-name
mountPath: '/volume-name-path'
subPath: emptyFile
volumes:
- name: volume-name
emptyDir: {}
describe pods shows
Containers:
demo:
Container ID: containerd://0b824265e96d75c5f77918326195d6029e22d17478ac54329deb47866bf8192d
Image: alpine
Image ID: docker.io/library/alpine#sha256:08d6ca16c60fe7490c03d10dc339d9fd8ea67c6466dea8d558526b1330a85930
Port: <none>
Host Port: <none>
Command:
sleep
3600
State: Running
Started: Wed, 10 Feb 2021 12:23:43 -0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4gp4x (ro)
/volume-name-path from volume-name (rw,path="emptyFile")
ls on the volume also shows nothing.
k8 exec -it demo-pod -c demo ls /volume-name-path
any suggestion??
PS: I don't want to use a ConfigMap and simply wants to create an empty file.
If the objective is to create a empty file when the Pod starts, then the most easy way is to either use the entrypoint of the docker image or an init container.
With the initContainer, you could go with something like the following (or with a more complex init image which you build and execute a whole bash script or something similar):
apiVersion: v1
kind: Pod
metadata:
name: demo-pod
spec:
initContainers:
- name: create-empty-file
image: alpine
command: ["touch", "/path/to/the/directory/empty_file"]
volumeMounts:
- name: volume-name
mountPath: /path/to/the/directory
containers:
- name: demo
image: alpine
command: ["sleep", "3600"]
volumeMounts:
- name: volume-name
mountPath: /path/to/the/directory
volumes:
- name: volume-name
emptyDir: {}
Basically the init container gets executed first, runs its command and if it is successful, then it terminates and the main container starts running. They share the same volumes (and they can also mount them at different paths) so in the example, the init container mount the emptyDir volume, creates an empty file and then complete. When the main container starts, the file is already there.
Regarding your legacy application which is getting ported on Kubernetes:
If you have control of the Dockerfile, you could simply change it create an empty file at the path you are expecting it to be, so that when the app starts, the file is already created there, empty, from the beginning, just exactly as you add the application to the container, you can add also other files.
For more info on init container, please check the documentation (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
I think you may be interested in Container Lifecycle Hooks.
In this case, the PostStart hook may help create an empty file as soon as the container is started:
This hook is executed immediately after a container is created.
In the example below, I will show you how you can use the PostStart hook to create an empty file-test file.
First I created a simple manifest file:
# demo-pod.yml
apiVersion: v1
kind: Pod
metadata:
labels:
run: demo-pod
name: demo-pod
spec:
containers:
- image: alpine
name: demo-pod
command: ["sleep", "3600"]
lifecycle:
postStart:
exec:
command: ["touch", "/mnt/file-test"]
After creating the Pod, we can check if the demo-pod container has an empty file-test file:
$ kubectl apply -f demo-pod.yml
pod/demo-pod created
$ kubectl exec -it demo-pod -- sh
/ # ls -l /mnt/file-test
-rw-r--r-- 1 root root 0 Feb 11 09:08 /mnt/file-test
/ # cat /mnt/file-test
/ #

How to make a k8s pod exits itself when the main container exits?

I am using a sidecard pattern for a k8s pod within which there're two containers: the main container and the sidecar container. I'd like to have the pod status depends on the main container only (say if the main container failed/completed, the pod should be in the same status) and discard the sidecard container.
Is there an elegant way to doing this?
Unfortunately the restartPolicy flag applies to all containers in the pod so the simple solution isn’t really going to work. Are you sure your logic shouldn’t be in an initContainer rather than a sidecar? If it does need to be a sidecar, have it sleep forever at the end of your logic.
As per documentation:
Pod is running and has two Containers. Container 1 exits with failure.
Log failure event.
If restartPolicy is:
Always: Restart Container; Pod phase stays Running.
OnFailure: Restart Container; Pod phase stays Running.
Never: Do not restart Container; Pod phase stays Running.
If Container 1 is not running, and Container 2 exits:
Log failure event.
If restartPolicy is:
Always: Restart Container; Pod phase stays Running.
OnFailure: Restart Container; Pod phase stays Running.
Never : Pod phase becomes Failed.
As workaround (partial solution for this problem) with restartPolicy: Never - you can apply the result of livenees probe from the main container to the sidecar container (using using exec, http or tcp probe).
It's not the good solution while working with microservices.
example:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness1
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /test-pd/healthy; sleep 30; rm -rf /test-pd/healthy; sleep 30
livenessProbe:
exec:
command:
- cat
- /test-pd/healthy
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- mountPath: /test-pd
name: test-volume
- name: liveness2
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- sleep 120
livenessProbe:
exec:
command:
- cat
- /test-pd2/healthy
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- mountPath: /test-pd2
name: test-volume
restartPolicy: Never
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /data
type: Directory
Please let me know if that helped.

Kubernetes doesn't allow to mount file to container

I ran into the below error when trying to deploy an application in a kubernetes cluster. It looks like kubernetes doesn't allow to mount a file to containers, do you know the possible reason?
deployment config file
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: model-loader-service
namespace: "{{ .Values.nsPrefix }}-aai"
spec:
selector:
matchLabels:
app: model-loader-service
template:
metadata:
labels:
app: model-loader-service
name: model-loader-service
spec:
containers:
- name: model-loader-service
image: "{{ .Values.image.modelLoaderImage }}:{{ .Values.image.modelLoaderVersion }}"
imagePullPolicy: {{ .Values.pullPolicy }}
env:
- name: CONFIG_HOME
value: /opt/app/model-loader/config/
volumeMounts:
- mountPath: /etc/localtime
name: localtime
readOnly: true
- mountPath: /opt/app/model-loader/config/
name: aai-model-loader-config
- mountPath: /var/log/onap
name: aai-model-loader-logs
- mountPath: /opt/app/model-loader/bundleconfig/etc/logback.xml
name: aai-model-loader-log-conf
subPath: logback.xml
ports:
- containerPort: 8080
- containerPort: 8443
- name: filebeat-onap-aai-model-loader
image: {{ .Values.image.filebeat }}
imagePullPolicy: {{ .Values.pullPolicy }}
volumeMounts:
- mountPath: /usr/share/filebeat/filebeat.yml
name: filebeat-conf
- mountPath: /var/log/onap
name: aai-model-loader-logs
- mountPath: /usr/share/filebeat/data
name: aai-model-loader-filebeat
volumes:
- name: localtime
hostPath:
path: /etc/localtime
- name: aai-model-loader-config
hostPath:
path: "/dockerdata-nfs/{{ .Values.nsPrefix }}/aai/model-loader/appconfig/"
- name: filebeat-conf
hostPath:
path: /dockerdata-nfs/{{ .Values.nsPrefix }}/log/filebeat/logback/filebeat.yml
Details information of this issue:
message: 'invalid header field value "oci runtime error: container_linux.go:247:
starting container process caused \"process_linux.go:359: container init
caused \\\"rootfs_linux.go:53: mounting \\\\\\\"/dockerdata-nfs/onap/log/filebeat/logback/filebeat.yml\\\\\\\"
to rootfs \\\\\\\"/var/lib/docker/aufs/mnt/7cd32a29938e9f70a727723f550474cb5b41c0966f45ad0c323360779f08cf5c\\\\\\\"
at \\\\\\\"/var/lib/docker/aufs/mnt/7cd32a29938e9f70a727723f550474cb5b41c0966f45ad0c323360779f08cf5c/usr/share/filebeat/filebeat.yml\\\\\\\"
caused \\\\\\\"not a directory\\\\\\\"\\\"\"\n"'
....
$ docker version
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.4
Git commit: 78d1802
Built: Tue Jan 10 20:38:45 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.4
Git commit: 78d1802
Built: Tue Jan 10 20:38:45 2017
OS/Arch: linux/amd64
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:48:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.3-rancher3", GitCommit:"772c4c54e1f4ae7fc6f63a8e1ecd9fe616268e16", GitTreeState:"clean", BuildDate:"2017-11-27T19:51:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
caused "not a directory" is kind of self explanatory. What is the exact volume and volumeMount definition you use ? do you use subPath in your declaration ?
EDIT: change
- name: filebeat-conf
hostPath:
path: /dockerdata-nfs/{{ .Values.nsPrefix }}/log/filebeat/logback/filebeat.yml
to
- name: filebeat-conf
hostPath:
path: /dockerdata-nfs/{{ .Values.nsPrefix }}/log/filebeat/logback/
and add subPath: filebeat.yml to volumeMount
SELinux may also be the culprit here. Log on to the node and execute sestatus. If the policy is disabled, you will see the output as SELINUX=disabled else it will be something similar to this:
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: mcs
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
First Option:
You can either disable the selinux by editing the /etc/selinux/config file and update the SELINUX=permissive to SELINUX=disabled. Once done, reboot the machine and deploy to see if fixed. However, this is not the recommonded way and can be seen as a temporary fix.
Second Option:
Log on to the node and execute ps -efZ | grep kubelet which will give something like this.
system_u:system_r:kernel_t:s0 root 1592 1 2 May23 ? 09:58:18 /usr/local/bin/kubelet --anonymous-auth=false
Now, from this output capture the string system_u:system_r:kernel_t:s0 which can be changed to security context as below in your deployment.
securityContext:
seLinuxOptions:
user: system_u
role: system_r
type: spc_t
level: s0
Deploy your application and check the logs if it is fixed. Do let me know if this works for you or need any further help.
Is this a multi-node cluster? If so, the file needs to exist on all Kubernetes nodes since the pod is typically scheduled on a randomly available host machine. In any case, ConfigMaps are a much better way to supply static/read-only files to a container.

Why Kubernetes Pod gets into Terminated state giving Completed reason and exit code 0?

I am struggling to find any answer to this in the Kubernetes documentation. The scenario is the following:
Kubernetes version 1.4 over AWS
8 pods running a NodeJS API (Express) deployed as a Kubernetes Deployment
One of the pods gets restarted for no apparent reason late at night (no traffic, no CPU spikes, no memory pressure, no alerts...). Number of restarts is increased as a result of this.
Logs don't show anything abnormal (ran kubectl -p to see previous logs, no errors at all in there)
Resource consumption is normal, cannot see any events about Kubernetes rescheduling the pod into another node or similar
Describing the pod gives back TERMINATED state, giving back COMPLETED reason and exit code 0. I don't have the exact output from kubectl as this pod has been replaced multiple times now.
The pods are NodeJS server instances, they cannot complete, they are always running waiting for requests.
Would this be internal Kubernetes rearranging of pods? Is there any way to know when this happens? Shouldn't be an event somewhere saying why it happened?
Update
This just happened in our prod environment. The result of describing the offending pod is:
api:
Container ID: docker://7a117ed92fe36a3d2f904a882eb72c79d7ce66efa1162774ab9f0bcd39558f31
Image: 1.0.5-RC1
Image ID: docker://sha256:XXXX
Ports: 9080/TCP, 9443/TCP
State: Running
Started: Mon, 27 Mar 2017 12:30:05 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 24 Mar 2017 13:32:14 +0000
Finished: Mon, 27 Mar 2017 12:29:58 +0100
Ready: True
Restart Count: 1
Update 2
Here it is the deployment.yaml file used:
apiVersion: "extensions/v1beta1"
kind: "Deployment"
metadata:
namespace: "${ENV}"
name: "${APP}${CANARY}"
labels:
component: "${APP}${CANARY}"
spec:
replicas: ${PODS}
minReadySeconds: 30
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
component: "${APP}${CANARY}"
spec:
serviceAccount: "${APP}"
${IMAGE_PULL_SECRETS}
containers:
- name: "${APP}${CANARY}"
securityContext:
capabilities:
add:
- IPC_LOCK
image: "134078050561.dkr.ecr.eu-west-1.amazonaws.com/${APP}:${TAG}"
env:
- name: "KUBERNETES_CA_CERTIFICATE_FILE"
value: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: "metadata.namespace"
- name: "ENV"
value: "${ENV}"
- name: "PORT"
value: "${INTERNAL_PORT}"
- name: "CACHE_POLICY"
value: "all"
- name: "SERVICE_ORIGIN"
value: "${SERVICE_ORIGIN}"
- name: "DEBUG"
value: "http,controllers:recommend"
- name: "APPDYNAMICS"
value: "true"
- name: "VERSION"
value: "${TAG}"
ports:
- name: "http"
containerPort: ${HTTP_INTERNAL_PORT}
protocol: "TCP"
- name: "https"
containerPort: ${HTTPS_INTERNAL_PORT}
protocol: "TCP"
The Dockerfile of the image referenced in the above Deployment manifest:
FROM ubuntu:14.04
ENV NVM_VERSION v0.31.1
ENV NODE_VERSION v6.2.0
ENV NVM_DIR /home/app/nvm
ENV NODE_PATH $NVM_DIR/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/v$NODE_VERSION/bin:$PATH
ENV APP_HOME /home/app
RUN useradd -c "App User" -d $APP_HOME -m app
RUN apt-get update; apt-get install -y curl
USER app
# Install nvm with node and npm
RUN touch $HOME/.bashrc; curl https://raw.githubusercontent.com/creationix/nvm/${NVM_VERSION}/install.sh | bash \
&& /bin/bash -c 'source $NVM_DIR/nvm.sh; nvm install $NODE_VERSION'
ENV NODE_PATH $NVM_DIR/versions/node/$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/versions/node/$NODE_VERSION/bin:$PATH
# Create app directory
WORKDIR /home/app
COPY . /home/app
# Install app dependencies
RUN npm install
EXPOSE 9080 9443
CMD [ "npm", "start" ]
npm start is an alias for a regular node app.js command that starts a NodeJS server on port 9080.
Check the version of docker you run, and whether the docker daemon was restarted during that time.
If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.
If this is still relevant, we just had a similar problem in our cluster.
We have managed to find more information by inspecting the logs from docker itself. ssh onto your k8s node and run the following:
sudo journalctl -fu docker.service
I hade similar problem when we upgraded to version 2.x Pos get restarted even after the Dags ran successfully.
I later resolved it after a long time of debugging by overriding the pod template and specifying it in the airflow.cfg file.
[kubernetes]
....
pod_template_file = {{ .Values.airflow.home }}/pod_template.yaml
---
# pod_template.yaml
apiVersion: v1
kind: Pod
metadata:
name: dummy-name
spec:
serviceAccountName: default
restartPolicy: Never
containers:
- name: base
image: dummy_image
imagePullPolicy: IfNotPresent
ports: []
command: []