Kubernetes Pod's containers not running when using sh commands - kubernetes

Pod containers are not ready and stuck under Waiting state over and over every single time after they run sh commands (/bin/sh as well).
As example all pod's containers seen at https://v1-17.docs.kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#define-container-environment-variables-with-data-from-multiple-configmaps they just go on "Complete" status after executing the sh command, or if I set "restartPolicy: Always" they have the "Waiting" state for the reason CrashLoopBackOff.
(Containers work fine if I do not set any command on them.
If I use the sh command within container, after creating them I can read using "kubectl logs" the env variable was set correctly.
The expected behaviour is to get pod's containers running after they execute the sh command.
I cannot find references regarding this particular problem and I need little help if possible, thank you very much in advance!
Please disregard I tried different images, the problem happens either way.
environment: Kubernetes v 1.17.1 on qemu VM
yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: special-config
data:
how: very
---
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: nginx
ports:
- containerPort: 88
command: [ "/bin/sh", "-c", "env" ]
env:
# Define the environment variable
- name: SPECIAL_LEVEL_KEY
valueFrom:
configMapKeyRef:
# The ConfigMap containing the value you want to assign to SPECIAL_LEVEL_KEY
name: special-config
# Specify the key associated with the value
key: how
restartPolicy: Always
describe pod:
kubectl describe pod dapi-test-pod
Name: dapi-test-pod
Namespace: default
Priority: 0
Node: kw1/10.1.10.31
Start Time: Thu, 21 May 2020 01:02:17 +0000
Labels: <none>
Annotations: cni.projectcalico.org/podIP: 192.168.159.83/32
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"dapi-test-pod","namespace":"default"},"spec":{"containers":[{"command...
Status: Running
IP: 192.168.159.83
IPs:
IP: 192.168.159.83
Containers:
test-container:
Container ID: docker://63040ec4d0a3e78639d831c26939f272b19f21574069c639c7bd4c89bb1328de
Image: nginx
Image ID: docker-pullable://nginx#sha256:30dfa439718a17baafefadf16c5e7c9d0a1cde97b4fd84f63b69e13513be7097
Port: 88/TCP
Host Port: 0/TCP
Command:
/bin/sh
-c
env
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 21 May 2020 01:13:21 +0000
Finished: Thu, 21 May 2020 01:13:21 +0000
Ready: False
Restart Count: 7
Environment:
SPECIAL_LEVEL_KEY: <set to the key 'how' of config map 'special-config'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbsw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-zqbsw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zqbsw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned default/dapi-test-pod to kw1
Normal Pulling 12m (x4 over 13m) kubelet, kw1 Pulling image "nginx"
Normal Pulled 12m (x4 over 13m) kubelet, kw1 Successfully pulled image "nginx"
Normal Created 12m (x4 over 13m) kubelet, kw1 Created container test-container
Normal Started 12m (x4 over 13m) kubelet, kw1 Started container test-container
Warning BackOff 3m16s (x49 over 13m) kubelet, kw1 Back-off restarting failed container

You can use this manifest; The command ["/bin/sh", "-c"] says "run a shell, and execute the following instructions". The args are then passed as commands to the shell. Multiline args make it simple and easy to read. Your pod will display its environment variables and also start the NGINX process without stopping:
apiVersion: v1
kind: ConfigMap
metadata:
name: special-config
data:
how: very
---
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: nginx
ports:
- containerPort: 88
command: ["/bin/sh", "-c"]
args:
- env;
nginx -g 'daemon off;';
env:
# Define the environment variable
- name: SPECIAL_LEVEL_KEY
valueFrom:
configMapKeyRef:
# The ConfigMap containing the value you want to assign to SPECIAL_LEVEL_KEY
name: special-config
# Specify the key associated with the value
key: how
restartPolicy: Always

This happens because the process in the container you are running has completed and the container shuts down, and so kubernetes marks the pod as completed.
If the command that is defined in the docker image as part of CMD, or if you've added your own command as you have done, then the container shuts down after the command completed. It's the same reason why when you run Ubuntu using plain docker, it starts up then shuts down directly afterwards.
For pods, and their underlying docker container to continue running, you need to start a process that will continue running. In your case, running the env command completes right away.
If you set the pod to restart Always, then kubernetes will keep trying to restart it until it's reached it's back off threshold.
One off commands like you're running are useful for utility type things. I.e. do one thing then get rid of the pod.
For example:
kubectl run tester --generator run-pod/v1 --image alpine --restart Never --rm -it -- /bin/sh -c env
To run something longer, start a process that continues running.
For example:
kubectl run tester --generator run-pod/v1 --image alpine -- /bin/sh -c "sleep 30"
That command will run for 30 seconds, and so the pod will also run for 30 seconds. It will also use the default restart policy of Always. So after 30 seconds the process completes, Kubernetes marks the pod as complete, and then restarts it to do the same things again.
Generally pods will start a long running process, like a web server. For Kubernetes to know if that pod is healthy, so it can do it's high availability magic and restart it if it cashes, it can use readiness and liveness probes.

Related

Why is Kubernetes pod failing to start?

I was trying to test one scenario where pod will mount a volume and it will try to write one file to it. Below mentioned yaml works fine when I exclude command and args. However with command and args it fails with "crashloopbackoff".
The describe command is not providing much information for the failure. What's wrong here?
Note: I was running this yaml on katacoda.
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
run: voltest
name: voltest
spec:
replicas: 1
selector:
matchLabels:
run: voltest
template:
metadata:
creationTimestamp: null
labels:
run: voltest
spec:
containers:
- image: nginx
name: voltest
volumeMounts:
- mountPath: /var/local/aaa
name: mydir
command: ["/bin/sh"]
args: ["-c", "echo 'test complete' > /var/local/aaa/testOut.txt"]
volumes:
- name: mydir
hostPath:
path: /var/local/aaa
type: DirectoryOrCreate
Describe command output:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 49s default-scheduler Successfully assigned default/voltest-78678dd56c-h5frs to controlplane
Normal Pulling 19s (x3 over 48s) kubelet, controlplane Pulling image "nginx"
Normal Pulled 17s (x3 over 39s) kubelet, controlplane Successfully pulled image "nginx"
Normal Created 17s (x3 over 39s) kubelet, controlplane Created container voltest
Normal Started 17s (x3 over 39s) kubelet, controlplane Started container voltest
Warning BackOff 5s (x4 over 35s) kubelet, controlplane Back-off restarting failed container
You've configured your pod to run a single shell command:
command: ["/bin/sh"]
args: ["-c", "echo 'test complete' > /var/testOut.txt"]
This means that the pod starts up, runs echo 'test complete' > /var/testOut.txt, and then immediately exits. From the perspective
of kubernetes, this is a crash.
You've replaced the default behavior of the nginx image ("run
nginx") with a shell command.
If you want the pod to continue running, you'll need to arrange for it
to run some sort of long-running command. A simple solution would be
something like:
command: ["/bin/sh"]
args: ["-c", "echo 'test complete' > /var/testOut.txt; sleep 3600"]
This will cause the pod to sleep for an hour before exiting, giving
you time to inspect the results of your shell command.
Note that your shell command isn't testing anything useful; you've
mounted your mydir volume on /var/local/aaa, but your shell
command is writing to /var/testOut.txt, so it's not making any use
of the volume.

Back-off restarting failed container In Azure AKS

Linux container pod, with docker images from Azure Container registry, keeps restarting with restartPolicy as Always. Pod description is as below.
kubectl describe pod example-pod
...
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 11 Jun 2020 03:27:11 +0000
Finished: Thu, 11 Jun 2020 03:27:12 +0000
...
Back-off restarting failed container
This pod is created with secret to access ACR registry repository.
Reason is that pod completes execution successfully with exit code 0. However, It should keep listening at particular port number. Microsoft document link is at this URL Container Group Runtime under header "Container continually exits and restarts"
deployment-example.yml file content is as below.
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
namespace: development
labels:
app: example
spec:
replicas: 1
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: example
image: contentocr.azurecr.io/example:latest
#command: ["ping -t localhost"]
imagePullPolicy: Always
ports:
- name: http-port
containerPort: 3000
imagePullSecrets:
- name: regpass
restartPolicy: Always
nodeSelector:
agent: linux
---
apiVersion: v1
kind: Service
metadata:
name: example
namespace: development
labels:
app: example
spec:
ports:
- name: http-port
port: 3000
targetPort: 3000
selector:
app: example
type: LoadBalancer
Output of kubectl get events is as below.
3m39s Normal Scheduled pod/example-deployment-5dc964fcf8-gbm5t Successfully assigned development/example-deployment-5dc964fcf8-gbm5t to aks-agentpool-18342716-vmss000000
2m6s Normal Pulling pod/example-deployment-5dc964fcf8-gbm5t Pulling image "contentocr.azurecr.io/example:latest"
2m5s Normal Pulled pod/example-deployment-5dc964fcf8-gbm5t Successfully pulled image "contentocr.azurecr.io/example:latest"
2m5s Normal Created pod/example-deployment-5dc964fcf8-gbm5t Created container example
2m49s Normal Started pod/example-deployment-5dc964fcf8-gbm5t Started container example
2m20s Warning BackOff pod/example-deployment-5dc964fcf8-gbm5t Back-off restarting failed container
6m6s Normal SuccessfulCreate replicaset/example-deployment-5dc964fcf8 Created pod: example-deployment-5dc964fcf8-2fdt5
3m39s Normal SuccessfulCreate replicaset/example-deployment-5dc964fcf8 Created pod: example-deployment-5dc964fcf8-gbm5t
6m6s Normal ScalingReplicaSet deployment/example-deployment Scaled up replica set example-deployment-5dc964fcf8 to 1
3m39s Normal ScalingReplicaSet deployment/example-deployment Scaled up replica set example-deployment-5dc964fcf8 to 1
3m38s Normal EnsuringLoadBalancer service/example Ensuring load balancer
3m34s Normal EnsuredLoadBalancer service/example Ensured load balancer
Docker file entry point is like ENTRYPOINT ["npm", "start"] with CMD ["tail -f /dev/null/"]
It runs locally. Implicitly, it assigns CI="true" flag. However, in docker-compose stdin_open: true or tty: true is to be set and in Kubernetes deployment file, ENV named variable CI is to be set up with value "true".
The below command solved my problem:-
az aks update -n aks-nks-k8s-cluster -g aks-nks-k8s-rg --attach-acr aksnksk8s
After executing the above command, below will be displayed:-
Add ROLE Propagation done [###############] 100.0000%
and then,
Running.. followed by Response trail after some time.
Here,
aks-nks-k8s-cluster : Cluster name I have created and using
aks-nks-k8s-rg : Resource Group have created and using
aksnksk8s : Container Registries which I have created and using

Kubernetes check readinessProbe at Service/Deployment level

Is there a way to request the status of a readinessProbe by using a service name linked to a deployment ? In an initContainer for example ?
Imagine we have a deployment X, using a readinessProbe, a service linked to it so we can request for example http://service-X:8080.
Now we create a deployment Y, in the initContainer we want to know if deployment X is ready. Is there a way to ask something likedeployment-X.ready or service-X.ready ?
I know that the correct way to handle dependencies is to let kubernetes do it for us, but i have a container which doesn't crash and I have no hand on it...
You can add a ngnix proxy sidecar on deployment Y.
Set the deploymentY.initContainer.readynessProbe to a port on nginx and that port is proxied to deploymentY.readynessProbe
Instead of readinessProbe You can use just InitContainer.
You create a pod/deployment X, make service X, and create a initContainer which is searching for the service X.
If he find it -> he will make the pod.
If he won't find it -> he will keep looking until service X will be created.
Just a simple example, we create nginx deployment by using kubectl apply -f nginx.yaml.
nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: nginx
ports:
- containerPort: 80
Then we create initContainer
initContainer.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', 'until nslookup my-nginx; do echo waiting for myapp-pod2; sleep 2; done;']
initContainer will look for service my-nginx, until You create it ,it will be in Init:0/1 status.
NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:0/1 0 15m
After You add service for example by using kubectl expose deployment/my-nginx and initContainer will find my-nginx service, he will be created.
NAME READY STATUS RESTARTS AGE
myapp-pod 1/1 Running 0 35m
Result:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/myapp-pod to kubeadm2
Normal Pulled 20s kubelet, kubeadm2 Container image "busybox:1.28" already present on machine
Normal Created 20s kubelet, kubeadm2 Created container init-myservice
Normal Started 20s kubelet, kubeadm2 Started container init-myservice
Normal Pulled 20s kubelet, kubeadm2 Container image "busybox:1.28" already present on machine
Normal Created 20s kubelet, kubeadm2 Created container myapp-container
Normal Started 20s kubelet, kubeadm2 Started container myapp-container
Let me know if that answer your question.
I finaly found a solution by following this link :
https://blog.giantswarm.io/wait-for-it-using-readiness-probes-for-service-dependencies-in-kubernetes/
We first need to create a ServiceAccount in Kubernetes to allow listing endpoints from an initContainer. After this, we ask for the available endpoints, if there is at least one, dependency is ready (in my case).

Kubernetes NFS volume mount fail with exit status 32

I have a Kubernetes setup installed in my Ubuntu machine. I'm trying to setup a nfs volume and mount it to a container according to this http://kubernetes.io/v1.1/examples/nfs/ document.
nfs service and pod configurations
kind: Service
apiVersion: v1
metadata:
name: nfs-server
spec:
ports:
- port: 2049
selector:
role: nfs-server
---
apiVersion: v1
kind: Pod
metadata:
name: nfs-server
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: jsafrane/nfs-data
ports:
- name: nfs
containerPort: 2049
securityContext:
privileged: true
pod configuration to mount nfs volume
apiVersion: v1
kind: Pod
metadata:
name: nfs-web
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
volumeMounts:
# name must match the volume name below
- name: nfs
mountPath: "/usr/share/nginx/html"
volumes:
- name: nfs
nfs:
# FIXME: use the right hostname
server: 192.168.3.201
path: "/"
When I run kubectl describe nfs-web I get following output mentioning it was unable to mount nfs volume. What could be the reason for that?
Name: nfs-web
Namespace: default
Image(s): nginx
Node: 192.168.1.114/192.168.1.114
Start Time: Sun, 06 Dec 2015 08:31:06 +0530
Labels: <none>
Status: Pending
Reason:
Message:
IP:
Replication Controllers: <none>
Containers:
web:
Container ID:
Image: nginx
Image ID:
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
nfs:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 192.168.3.201
Path: /
ReadOnly: false
default-token-nh698:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-nh698
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
36s 36s 1 {scheduler } Scheduled Successfully assigned nfs-web to 192.168.1.114
36s 2s 5 {kubelet 192.168.1.114} FailedMount Unable to mount volumes for pod "nfs-web_default": exit status 32
36s 2s 5 {kubelet 192.168.1.114} FailedSync Error syncing pod, skipping: exit status 32
I had the same problem, and I solved it by installing nfs-common in every Kubernetes nodes.
apt-get install -y nfs-common
My nodes were installed without nfs-common. Kubernetes will ask each node to mount the NFS into a specific directory to be available to the pod. As mount.nfs was not found, the mounting process failed.
Good luck!
It looks like volumes.nfs.server=192.168.3.201 is incorrectly configured on your client. It should be set to the ClusterIP address of your nfs-server Service.
Had the same issue with NFS which only allowed root mounts.
fixed by:
a. allow non-root users to mount NFS (on the server).
or
b. in PersistentVolume add
mountOptions:
- nfsvers=4.1
I fixed this issue by installing nfs-utils on the worker nodes.
In my case the issue was that i hadn't declared the host server of the nfs in the /etc/exports file. After adding an entry in there for my host server, the volume was working correctly.
if you modify the file in anyway then you need restart the service too;
sudo systemctl restart nfs-kernel-server
An example of an entry in the /etc/exports file;
/var/nfs/home 192.111.222.333(rw,sync,no_subtree_check)
In my case, the issue was the folder defined in volume hostPath was not created in the local. Once the folder was created in the worker node server, the issue was addressed.
Warning FailedMount 3m18s kubelet Unable to attach or mount volumes: unmounted volumes=[temp-volume], unattached volumes=[nfsvol-vre-data temp1-volume consumer1-serviceaccount-token-sdfsdf nfsvol]: timed out waiting for the condition
Warning FailedMount 71s (x10 over 5m20s) kubelet MountVolume.SetUp failed for volume "temp-volume" : hostPath type check failed: /tmp/folder is not a directory
Warning FailedMount 63s kubelet Unable to attach or mount volumes: unmounted volumes=[temp-volume], unattached volumes=[nfsvol nfsvol-vre-data temp1-volume consumer1-serviceaccount-token-sdfsdf]: timed out waiting for the condition
You need to execute the following on each master and node
sudo yum install nfs-utils -y

No nodes available to schedule pods, using google container engine

I'm having an issue where a container I'd like to run doesn't appear to be getting started on my cluster.
I've tried searching around for possible solutions, but there's a surprising lack of information out there to assist with this issue or anything of it's nature.
Here's the most I could gather:
$ kubectl describe pods/elasticsearch
Name: elasticsearch
Namespace: default
Image(s): my.image.host/my-project/elasticsearch
Node: /
Labels: <none>
Status: Pending
Reason:
Message:
IP:
Replication Controllers: <none>
Containers:
elasticsearch:
Image: my.image.host/my-project/elasticsearch
Limits:
cpu: 100m
State: Waiting
Ready: False
Restart Count: 0
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Mon, 19 Oct 2015 10:28:44 -0500 Mon, 19 Oct 2015 10:34:09 -0500 12 {scheduler } failedScheduling no nodes available to schedule pods
I also see this:
$ kubectl get pod elasticsearch -o wide
NAME READY STATUS RESTARTS AGE NODE
elasticsearch 0/1 Pending 0 5s
I guess I'd like to know: What prerequisites exist so that I can be confident that my container is going to run in container engine? What do I need to do in this scenario to get it running?
Here's my yml file:
apiVersion: v1
kind: Pod
metadata:
name: elasticsearch
spec:
containers:
- name: elasticsearch
image: my.image.host/my-project/elasticsearch
ports:
- containerPort: 9200
resources:
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch
volumes:
- name: elasticsearch-data
gcePersistentDisk:
pdName: elasticsearch-staging
fsType: ext4
Here's some more output about my node:
$ kubectl get nodes
NAME LABELS STATUS
gke-elasticsearch-staging-00000000-node-yma3 kubernetes.io/hostname=gke-elasticsearch-staging-00000000-node-yma3 NotReady
You only have one node in your cluster and its status in NotReady. So you won't be able to schedule any pods. You can try to determine why your node isn't ready by looking in /var/log/kubelet.log. You can also add new nodes to your cluster (scale the cluster size up to 2) or delete the node (it will be automatically replaced by the instance group manager) to see if either of those options get you a working node.
It appears that scheduler couldn't see any nodes in your cluster. You can run kubectl get nodes and gcloud compute instances list to confirm whether you have any nodes in the cluster. Did you correctly specify number of nodes (--num-nodes) when creating the cluster?