Probes not found on $PATH - kubernetes

I am redeploying a K3s deployment from a few months ago. Then, it worked perfectly, with no problems. However, when I try deploying it now - after making some other fixes, I get the following error:
Warning Unhealthy 32m kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "8078b7c54b9bb1609451ae1c2e832ede0670f264490f6ee34e334673fd025681": OCI runtime exec failed: exec failed: unable to start container process: exec: "grpc_health_probe": executable file not found in $PATH: unknown
This is that the .yaml file I am using for the deployment.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: vei-deployment
spec:
replicas: 1
selector:
matchLabels:
app: server-pod
template:
metadata:
labels:
app: server-pod
spec:
containers:
- name: server-pod
image: myname/mydeployment:latest
env:
- name: AWS_ACCESS_KEY_ID
value: $AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
value: $AWS_SECRET_ACCESS_KEY
ports:
- name: grpc
containerPort: 50051
livenessProbe:
exec:
command:
- grpcurl
- -plaintext
- localhost:50051
- ping.Pinger/Ping
readinessProbe:
exec:
command:
- grpc_health_probe
- -addr=:50051
The same error is thrown for every command in both the liveliness probe and the readiness probe. Has something changed with respect to probes in K3s over the past few months?

Warning Unhealthy 32m kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "8078b7c54b9bb1609451ae1c2e832ede0670f264490f6ee34e334673fd025681": OCI runtime exec failed: exec failed: unable to start container process: exec: "grpc_health_probe": executable file not found in $PATH: unknown
This error could be caused by a few different scenarios as addressed below :
It could be that the container image you are using is missing the grpchealthprobe executable, or it could be that the image is not correctly configured to find the executable in the $PATH. If the image is correctly configured, it could also be that the kubelet is not able to access the container image.
As #DazWilkin It looks like the issue is that the grpchealthprobe binary is not present in your Kubernetes cluster.It looks like your Kubernetes deployment is failing due to a missing executable file. This error can occur when the file needed by the readiness probe is not present in the container. To solve this, you will need to make sure that the file is present in the container. This can be done by adding the file to the container image or by mounting the file into the container.you'll need to make sure that the "grpchealthprobe" executable is in the $PATH environment variable, or specify a full path to the executable in the readiness probe configuration. Additionally, you may need to ensure that the permissions of the executable are set correctly so that it can be executed. Once the file is present, the readiness probe should start working properly. you should be able to deploy your K3s deployment without any further issues.
For more info follow this doc.

Related

Need help referencing image correctly in minikube private registry

I'm trying to deploy a custom pod on minikube and I'm getting the following message regardless of my twicks:
Failed to load logs: container "my-pod" in pod "my-pod-766c646c85-nbv4c" is waiting to start: image can't be pulled
Reason: BadRequest (400)
I did all sorts of experiments based on https://minikube.sigs.k8s.io/docs/handbook/pushing/ and https://number1.co.za/minikube-deploy-a-container-using-a-private-image-registry/ without success.
I ended up trying to use minikube image load myimage:latest and reference it in the container spec as:
...
containers:
- name: my-pod
image: myimage:latest
ports:
- name: my-pod
containerPort: 8080
protocol: TCP
...
Should/can I use minikube image?
If so, should I use the full image name docker.io/library/myimage:latest or just the image suffix myimage:latest?
Is there anything else I need to do to make minikube locate the image?
Is there a way to get the logs of the bad request itself to see what is going on (I don't see anything in the api server logs)?
I also see the following error in the minikube system:
Failed to load logs: container "registry-creds" in pod "registry-creds-6b884645cf-gkgph" is waiting to start: ContainerCreating
Reason: BadRequest (400)
Thanks!
Amos
You should set the imagePullPolicy to IfNotPresent. Changing that will tell kubernetes to not pull the image if it does not need to.
...
containers:
- name: my-pod
image: myimage:latest
imagePullPolicy: IfNotPresent
ports:
- name: my-pod
containerPort: 8080
protocol: TCP
...
A quirk of kubernetes is that if you specify an image with the latest tag as you have here, it will default to using imagePullPolicy=Always, which is why you are seeing this error.
More on how kubernetes decides the default image pull policy
If you need your image to always be pulled in production, consider using helm to template your kubernetes yaml configuration.

Unable to upload a file through a deployment yaml in kubernetes

I am unable to upload a file through a deployment YAML in Kubernetes.
The deployment YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
labels:
app: test
spec:
replicas: 1
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- name: test
image: openjdk:14
ports:
- containerPort: 8080
volumeMounts:
- name: testing
mountPath: "/usr/src/myapp/docker.jar"
workingDir: "/usr/src/myapp"
command: ["java"]
args: ["-jar", "docker.jar"]
volumes:
- hostPath:
path: "C:\\Users\\user\\Desktop\\kubernetes\\docker.jar"
type: File
name: testing
I get the following error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19s default-scheduler Successfully assigned default/test-64fb7fbc75-mhnnj to minikube
Normal Pulled 13s (x3 over 15s) kubelet Container image "openjdk:14" already present on machine
Warning Failed 12s (x3 over 14s) kubelet Error: Error response from daemon: invalid mode: /usr/src/myapp/docker.jar
When I remove the volumeMount it runs with the error unable to access docker.jar.
volumeMounts:
- name: testing
mountPath: "/usr/src/myapp/docker.jar"
This is a community wiki asnwer. Feel free to expand it.
That is a known issue with Docker on Windows. Right now it is not possible to correctly mount Windows directories as volumes.
You could try some of the workarounds mentioned by #CodeWizard in this github thread like here or here.
Also, if you are using VirtualBox, you might want to check this solution:
On Windows, you can not directly map Windows directory to your
container. Because your containers are reside inside a VirtualBox VM.
So your docker -v command actually maps the directory between the VM
and the container.
So you have to do it in two steps:
Map a Windows directory to the VM through VirtualBox manager Map a
directory in your container to the directory in your VM You better use
the Kitematic UI to help you. It is much eaiser.
Alternatively, you can deploy your setup on Linux environment to completely omit those specific kind of issues.

Error in deploy stage: "lchmod (file attributes) error: Not supported"

I am attempting to deploy an image "casbin-role-backend" to cloud, but it always failed.
The following is found from log:
Preparing to start the job...
Pipeline image: latest
Preparing the build artifacts...
lchmod (file attributes) error: Not supported
.....
DEPLOYING using manifest
+++ kubectl apply --namespace default -f ./tmp.deployment.yaml
deployment.apps/casbin-role-backend unchanged
The Service "casbin-role-backend" is invalid: spec.ports[0].nodePort: Invalid value: 30080: provided port is already allocated
+++ set +x
CHECKING deployment rollout of casbin-role-backend
+++ kubectl rollout status deploy/casbin-role-backend --watch=true --timeout=150s --namespace default
error: deployment "casbin-role-backend" exceeded its progress deadline
+++ STATUS=fail
+++ set +x
SHOWING last events
LAST SEEN TYPE REASON OBJECT MESSAGE
41m Warning Failed pod/casbin-role-mgt-ui-7d59b6d4cf-2pbhm Error: InvalidImageName
2m11s Warning InspectFailed pod/casbin-role-backend-68d76464dd-vbvch Failed to apply default image tag "//:": couldn't parse image reference "//:": invalid reference format
...
DEPLOYMENT FAILED
....
OK
Finished: FAILED
And below is my deployment.yaml:
apiVersion: v1
kind: Service
metadata:
name: casbin-role-backend
labels:
app: app
spec:
type: NodePort
ports:
- port: 3000
name: http
nodePort: 30080
selector:
app: app
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: casbin-role-backend
spec:
replicas: 1
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: casbin-role-backend
image: xxx/casbin-role-backend
ports:
- containerPort: 3000
Does anybody know what error is it? I had searched it for some time but still cannot find what is it and how to fix.
Update:
The source code is originated from below, and I added Dockerfile and deployment.yaml to deploy it on k8s.
https://github.com/alikhan866/Casbin-Role-Mgt-Dashboard-RBAC
Dockerfile source:
# pull official base image
FROM node:13.12.0-alpine
# set working directory
WORKDIR /dist
# add `/app/node_modules/.bin` to $PATH
ENV PATH /app/node_modules/.bin:$PATH
# install app dependencies
COPY package.json ./
COPY package-lock.json ./
RUN npm install
# add app
COPY . ./
# start app
CMD ["npm", "run dev"]
I see two issues here:
1.
The Service "casbin-role-backend" is invalid: spec.ports[0].nodePort: Invalid value: 30080: provided port is already allocated
It means that the port used by the nodePort service is already in use. You can list these services with: kubectl get svc --all-namespaces | grep '30080' and change the port value or delete the service. Also, make sure that you specify the proper namespace.
2.
2m11s Warning InspectFailed pod/casbin-role-backend-68d76464dd-vbvch Failed to apply default image tag "//:": couldn't parse image reference "//:": invalid reference format`
My educated guess here is that your image name is invalid because it starts with https:// or ://. A proper image name should look like this:
image: repository:organization_name/image_name:image_version

Chaining container error in Kubernetes

I am new to kubernetes and docker. I am trying to chain 2 containers in a pod such that the second container should not be up until the first one is running. I searched and got a solution here. It says to add "depends" field in YAML file for the container which is dependent on another container. Following is a sample of my YAML file:
apiVersion: v1beta4
kind: Pod
metadata:
name: test
labels:
apps: test
spec:
containers:
- name: container1
image: <image-name>
ports:
- containerPort: 8080
hostPort: 8080
- name: container2
image: <image-name>
depends: ["container1"]
Kubernetes gives me following error after running the above yaml file:
Error from server (BadRequest): error when creating "new.yaml": Pod in version "v1beta4" cannot be handled as a Pod: no kind "Pod" is registered for version "v1beta4"
Is the apiVersion problem here? I even tried v1, apps/v1, extensions/v1 but got following errors (respectively):
error: error validating "new.yaml": error validating data: ValidationError(Pod.spec.containers[1]): unknown field "depends" in io.k8s.api.core.v1.Container; if you choose to ignore these errors, turn validation off with --validate=false
error: unable to recognize "new.yaml": no matches for apps/, Kind=Pod
error: unable to recognize "new.yaml": no matches for extensions/, Kind=Pod
What am I doing wrong here?
As I understand there is no field called depends in the Pod Specification.
You can verify and validate by following command:
kubectl explain pod.spec --recursive
I have attached a link to understand the structure of the k8s resources.
kubectl-explain
There is no property "depends" in the Container API object.
You split your containers in two different pods and let the kubernetes cli wait for the first container to become available:
kubectl create -f container1.yaml --wait # run command until the pod is available.
kubectl create -f container2.yaml --wait

My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log

This is what I keep getting:
[root#centos-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-h6nw8 1/1 Running 0 1h
nfs-web-07rxz 0/1 CrashLoopBackOff 8 16m
nfs-web-fdr9h 0/1 CrashLoopBackOff 8 16m
Below is output from describe pods
kubectl describe pods
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
16m 16m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-web-fdr9h to centos-minion-2
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Created Created container with docker id 495fcbb06836
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Started Started container with docker id 495fcbb06836
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Started Started container with docker id d56f34ae4e8f
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Created Created container with docker id d56f34ae4e8f
16m 16m 2 {kubelet centos-minion-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "web" with CrashLoopBackOff: "Back-off 10s restarting failed container=web pod=nfs-web-fdr9h_default(461c937d-d870-11e6-98de-005056040cc2)"
I have two pods: nfs-web-07rxz, nfs-web-fdr9h, but if I do kubectl logs nfs-web-07rxz or with -p option I don't see any log in both pods.
[root#centos-master ~]# kubectl logs nfs-web-07rxz -p
[root#centos-master ~]# kubectl logs nfs-web-07rxz
This is my replicationController yaml file:
replicationController yaml file
apiVersion: v1 kind: ReplicationController metadata: name: nfs-web spec: replicas: 2 selector:
role: web-frontend template:
metadata:
labels:
role: web-frontend
spec:
containers:
- name: web
image: eso-cmbu-docker.artifactory.eng.vmware.com/demo-container:demo-version3.0
ports:
- name: web
containerPort: 80
securityContext:
privileged: true
My Docker image was made from this simple docker file:
FROM ubuntu
RUN apt-get update
RUN apt-get install -y nginx
RUN apt-get install -y nfs-common
I am running my kubernetes cluster on CentOs-1611, kube version:
[root#centos-master ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
If I run the docker image by docker run I was able to run the image without any issue, only through kubernetes I got the crash.
Can someone help me out, how can I debug without seeing any log?
As #Sukumar commented, you need to have your Dockerfile have a Command to run or have your ReplicationController specify a command.
The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.
#Show details of specific pod
kubectl describe pod <pod name> -n <namespace-name>
# View logs for specific pod
kubectl logs <pod name> -n <namespace-name>
If you have an application that takes slower to bootstrap, it could be related to the initial values of the readiness/liveness probes. I solved my problem by increasing the value of initialDelaySeconds to 120s as my SpringBoot application deals with a lot of initialization. The documentation does not mention the default 0 (https://kubernetes.io/docs/api-reference/v1.9/#probe-v1-core)
service:
livenessProbe:
httpGet:
path: /health/local
scheme: HTTP
port: 8888
initialDelaySeconds: 120
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
readinessProbe:
httpGet:
path: /admin/health
scheme: HTTP
port: 8642
initialDelaySeconds: 150
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
A very good explanation about those values is given by What is the default value of initialDelaySeconds.
The health or readiness check algorithm works like:
wait for initialDelaySeconds
perform check and wait timeoutSeconds for a timeout
if the number of continued successes is greater than successThreshold return success
if the number of continued failures is greater than failureThreshold return failure otherwise wait periodSeconds and start a new check
In my case, my application can now bootstrap in a very clear way, so that I know I will not get periodic crashloopbackoff because sometimes it would be on the limit of those rates.
I had the need to keep a pod running for subsequent kubectl exec calls and as the comments above pointed out my pod was getting killed by my k8s cluster because it had completed running all its tasks. I managed to keep my pod running by simply kicking the pod with a command that would not stop automatically as in:
kubectl run YOUR_POD_NAME -n YOUR_NAMESPACE --image SOME_PUBLIC_IMAGE:latest --command tailf /dev/null
My pod kept crashing and I was unable to find the cause. Luckily there is a space where kubernetes saves all the events that occurred before my pod crashed.
(#List Events sorted by timestamp)
To see these events run the command:
kubectl get events --sort-by=.metadata.creationTimestamp
make sure to add a --namespace mynamespace argument to the command if needed
The events shown in the output of the command showed my why my pod kept crashing.
From This page, the container dies after running everything correctly but crashes because all the commands ended. Either you make your services run on the foreground, or you create a keep alive script. By doing so, Kubernetes will show that your application is running. We have to note that in the Docker environment, this problem is not encountered. It is only Kubernetes that wants a running app.
Update (an example):
Here's how to avoid CrashLoopBackOff, when launching a Netshoot container:
kubectl run netshoot --image nicolaka/netshoot -- sleep infinity
In your yaml file, add command and args lines:
...
containers:
- name: api
image: localhost:5000/image-name
command: [ "sleep" ]
args: [ "infinity" ]
...
Works for me.
I observed the same issue, and added the command and args block in yaml file. I am copying sample of my yaml file for reference
apiVersion: v1
kind: Pod
metadata:
labels:
run: ubuntu
name: ubuntu
namespace: default
spec:
containers:
- image: gcr.io/ow/hellokubernetes/ubuntu
imagePullPolicy: Never
name: ubuntu
resources:
requests:
cpu: 100m
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
dnsPolicy: ClusterFirst
enableServiceLinks: true
As mentioned in above posts, the container exits upon creation.
If you want to test this without using a yaml file, you can pass the sleep command to the kubectl create deployment statement. The double hyphen -- indicates a command, which is equivalent of command: in a Pod or Deployment yaml file.
The below command creates a deployment for debian with sleep 1234, so it doesn't exit immediately.
kubectl create deployment deb --image=debian:buster-slim -- "sh" "-c" "while true; do sleep 1234; done"
You then can create a service etc, or, to test the container, you can kubectl exec -it <pod-name> -- sh (or -- bash) into the container you just created to test it.
I solved this problem I increased memory resource
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 100m
memory: 250Mi
In my case the problem was what Steve S. mentioned:
The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.
Namely I had a Java application whose main threw an exception (and something overrode the default uncaught exception handler so that nothing was logged). The solution was to put the body of main into try { ... } catch and print out the exception. Thus I could find out what was wrong and fix it.
(Another cause could be something in the app calling System.exit; you could use a custom SecurityManager with an overridden checkExit to prevent (or log the caller of) exit; see https://stackoverflow.com/a/5401319/204205.)
Whilst troubleshooting the same issue I found no logs when using kubeclt logs <pod_id>.
Therefore I ssh:ed in to the node instance to try to run the container using plain docker. To my surprise this failed also.
When entering the container with:
docker exec -it faulty:latest /bin/sh
and poking around I found that it wasn't the latest version.
A faulty version of the docker image was already available on the instance.
When I removed the faulty:latest instance with:
docker rmi faulty:latest
everything started to work.
I had same issue and now I finally resolved it. I am not using docker-compose file.
I just added this line in my Docker file and it worked.
ENV CI=true
Reference:
https://github.com/GoogleContainerTools/skaffold/issues/3882
Try rerunning the pod and running
kubectl get pods --watch
to watch the status of the pod as it progresses.
In my case, I would only see the end result, 'CrashLoopBackOff,' but the docker container ran fine locally. So I watched the pods using the above command, and I saw the container briefly progress into an OOMKilled state, which meant to me that it required more memory.
In my case this error was specific to the hello-world docker image. I used the nginx image instead of the hello-world image and the error was resolved.
i solved this problem by removing space between quotes and command value inside of array ,this is happened because container exited after started and no executable command present which to be run inside of container.
['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']
I had similar issue but got solved when I corrected my zookeeper.yaml file which had the service name mismatch with file deployment's container names. It got resolved by making them same.
apiVersion: v1
kind: Service
metadata:
name: zk1
namespace: nbd-mlbpoc-lab
labels:
app: zk-1
spec:
ports:
- name: client
port: 2181
protocol: TCP
- name: follower
port: 2888
protocol: TCP
- name: leader
port: 3888
protocol: TCP
selector:
app: zk-1
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: zk-deployment
namespace: nbd-mlbpoc-lab
spec:
template:
metadata:
labels:
app: zk-1
spec:
containers:
- name: zk1
image: digitalwonderland/zookeeper
ports:
- containerPort: 2181
env:
- name: ZOOKEEPER_ID
value: "1"
- name: ZOOKEEPER_SERVER_1
value: zk1
In my case, the issue was a misconstrued list of command-line arguments. I was doing this in my deployment file:
...
args:
- "--foo 10"
- "--bar 100"
Instead of the correct approach:
...
args:
- "--foo"
- "10"
- "--bar"
- "100"
I finally found the solution when I execute 'docker run xxx ' command ,and I got the error then.It is caused by incomplete-platform .
It seems there could be a lot of reasons why a Pod should be in crashloopbackoff state.
In my case, one of the container was terminating continuously due to the missing Environment value.
So, the best way to debug is to -
1. check Pod description output i.e. kubectl describe pod abcxxx
2. check the events generated related to the Pod i.e. kubectl get events| grep abcxxx
3. Check if End-points have been created for the Pod i.e. kubectl get ep
4. Check if dependent resources have been in-place e.g. CRDs or configmaps or any other resource that may be required.
kubectl logs -f POD, will only produce logs from a running container. Suffix --previous to the command to get logs from a previous container. Used maily for debugging. Hope this helps.