Pod labeling not possible from within pod using Kubernetes on Docker-EE - kubernetes

We are using a Apache-Kafka deployment on Kubernetes which is based on the ability to label pods after they have been created (see https://github.com/Yolean/kubernetes-kafka). The init container of the broker pods takes advantage of this feature to set a label on itself with its own numeric index (e.g. "0", "1", etc) as value. The label is used in the service descriptors to select exactly one pod.
This approach works fine on our DIND-Kubernetes environment. However, when tried to port the deployment onto a Docker-EE Kubernetes environment we ran into trouble because the command kubectl label pod generates a run time error which is completely misleading (also see https://github.com/fabric8io/kubernetes-client/issues/853).
In order to verify the run time error in a minimal setup we created the following deployment scripts.
First step: Successfully label pod using the Docker-EE-Host
# create a simple pod as a test target for labeling
> kubectl run -ti -n default --image alpine sh
# get the pod name for all further steps
> kubectl -n default get pods
NAME READY STATUS RESTARTS AGE
nfs-provisioner-7d49cdcb4f-8qx95 1/1 Running 1 7d
nginx-deployment-76dcc8c697-ng4kb 1/1 Running 1 7d
nginx-deployment-76dcc8c697-vs24j 1/1 Running 0 20d
sh-777f6db646-hrm65 1/1 Running 0 3m <--- This is the test pod
test-76bbdb4654-9wd9t 1/1 Running 2 6d
test4-76dbf847d5-9qck2 1/1 Running 0 5d
# get client and server versions
> kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5",
GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean",
BuildDate:"2018-06-21T11:46:00Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.11- docker-8d637ae", GitCommit:"8d637aedf46b9c21dde723e29c645b9f27106fa5",
GitTreeState:"clean", BuildDate:"2018-04-26T16:51:21Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
# set label
kubectl -n default label pod sh-777f6db646-hrm65 "mylabel=hallo"
pod "sh-777f6db646-hrm65" labeled <---- successful execution
Everything works fine as expected.
Second step: Reproduce run-time error from within pod
Create Docker image containing kubectl 1.10.5
FROM debian:stretch-
slim#sha256:ea42520331a55094b90f6f6663211d4f5a62c5781673935fe17a4dfced777029
ENV KUBERNETES_VERSION=1.10.5
RUN set -ex; \
export DEBIAN_FRONTEND=noninteractive; \
runDeps='curl ca-certificates procps netcat'; \
buildDeps=''; \
apt-get update && apt-get install -y $runDeps $buildDeps --no-install- recommends; \
rm -rf /var/lib/apt/lists/*; \
\
curl -sLS -o k.tar.gz -k https://dl.k8s.io/v${KUBERNETES_VERSION}/kubernetes-client-linux-amd64.tar.gz; \
tar -xvzf k.tar.gz -C /usr/local/bin/ --strip-components=3 kubernetes/client/bin/kubectl; \
rm k.tar.gz; \
\
apt-get purge -y --auto-remove $buildDeps; \
rm /var/log/dpkg.log /var/log/apt/*.log
This image is deployed as 10.100.180.74:5000/test/kubectl-client-1.10.5 in a site local registry and will be referred to below.
Create a pod using the container above
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
name: pod-labeler
namespace: default
spec:
selector:
matchLabels:
app: pod-labeler
replicas: 1
serviceName: pod-labeler
updateStrategy:
type: OnDelete
template:
metadata:
labels:
app: pod-labeler
annotations:
spec:
terminationGracePeriodSeconds: 10
containers:
- name: check-version
image: 10.100.180.74:5000/test/kubectl-client-1.10.5
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
value: sh-777f6db646-hrm65
command: ["/usr/local/bin/kubectl", "version" ]
- name: label-pod
image: 10.100.180.74:5000/test/kubectl-client-1.10.5
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
value: sh-777f6db646-hrm65
command: ["/bin/bash", "-c", "/usr/local/bin/kubectl -n default label pod $POD_NAME 'mylabel2=hallo'" ]
Logging output
We get the following logging output
# Log of the container "check-version"
2018-07-18T11:11:10.791011157Z Client Version: version.Info{Major:"1",
Minor:"10", GitVersion:"v1.10.5",
GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean",
BuildDate:"2018-\
06-21T11:46:00Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
2018-07-18T11:11:10.791058997Z Server Version: version.Info{Major:"1",
Minor:"8+", GitVersion:"v1.8.11-docker-8d637ae",
GitCommit:"8d637aedf46b9c21dde723e29c645b9f27106fa5", GitTreeState:"clean",
BuildDate:"2018-04-26T16:51:21Z", GoVersion:"go1.8.3", Compiler:"gc",
Platform:"linux/amd64"}
and the run time error
2018-07-18T11:24:15.448695813Z The Pod "sh-777f6db646-hrm65" is invalid:
spec.tolerations: Forbidden: existing toleration can not be modified except its tolerationSeconds
Notes
This is not an authorization problem since we've given the default user of the default namespace full administrative rights. In case we don't, we get an error message referring to missing permissions.
Both client and servers versions "outside" (e.g on the docker host) and "inside" (e.g. the pod) are identical down to the GIT commit tag
We are using version 3.0.2 of the Universal Control Plane
Any ideas?

It was pointed out in one of the comments that the issue may be caused by a missing permission even though the error message does not insinuate so. We officially filed a ticket with Docker and actually got exactly this result: In order to be able to set/modify a label from within a pod the default user of the namespace must be given the "Scheduler" role on the swarm resource (which later shows up as \ in the GUI). Granting this permission fixes the problem. See added grant in Docker-EE-GUI below.
From my point of view, this is far from obvious. The Docker support representative offered to investigate if this is actually expected behavior or results from a bug. As soon as we learn more on this question I will include it into our answer.
As for using more debugging output: Unfortunately, adding --v=9 to the calls of kubectl does not return any useful information. There's too much output to be displayed here but the overall logging is very similar in both cases: It consists of a lot GET API requests which are all successful followed by a final PATCH API request which succeeds in the one case but fails in the other as described above.

Related

Create deployments with kubectl version 1.18 +

At page 67 of Kubernetes: Up and Running, 2nd Edition, the author uses the command below in order to create a Deployment:
kubectl run alpaca-prod \
--image=gcr.io/kuar-demo/kuard-amd64:blue \
--replicas=2 \
--labels="ver=1,app=alpaca,env=prod"
However this command is deprecated with kubectl 1.19+, and it now creates a pod:
$ kubectl run alpaca-prod \
--image=gcr.io/kuar-demo/kuard-amd64:blue \
--replicas=2 \
--labels="ver=1,app=alpaca,env=prod"
Flag --replicas has been deprecated, has no effect and will be removed in the future.
pod/alpaca-prod created
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Is there a way to use kubectl run to create a deployment with replicas and custom label with kubectl 1.19+?
It is now preferred to use kubectl create to create a new Deployment, instead of kubectl run.
This is the corresponsing command to your kubectl run
kubectl create deployment alpaca-prod --image=gcr.io/kuar-demo/kuard-amd64:blue --replicas=2
Labels
By default from kubectl create deployment alpaca-proc you will get the label app=alpaca.
The get the other labels, you need to add them later. Use kubectl label to add labels to the Deployment, e.g.
kubectl label deployment alpaca-prod ver=1
Note: this only adds the label to the Deployment and not to the Pod-template, e.g. the Pods will not get the label. To also add the label to the pods, you need to edit the template: part of the Deployment-yaml.
Note: With kubectl version 1.18 things have changed. Like its no longer possible to use kubectl run to create Jobs, CronJobs or Deployments, only Pods still work.
So yes you cannot create a deployment from kubectl run from 1.18.
Step 1: Create deployment from kubectl create command
kubectl create deploy alpaca-prod --image=gcr.io/kuar-demo/kuard-amd64:blue --replicas=2
Step 2 Update labels with kubectl label command
kubectl label deploy -l app=alpaca-prod ver=1
kubectl label deploy -l app=alpaca-prod app=alpaca
kubectl label deploy -l app=alpaca-prod env=prod
Here is the yaml file which produces the expected result for p67 of 'Kubernetes: Up and Running, 2nd Edition':
apiVersion: apps/v1
kind: Deployment
metadata:
name: alpaca-prod
spec:
selector:
matchLabels:
ver: "1"
app: "alpaca"
env: "prod"
replicas: 2
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
ver: "1"
app: "alpaca"
env: "prod"
spec:
containers:
- name: kuard
image: gcr.io/kuar-demo/kuard-amd64:blue

Unknown field "setHostnameAsFQDN" despite using latest kubectl client

I have a deployment yaml file that looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-kubernetes
spec:
replicas: 1
selector:
matchLabels:
app: hello-kubernetes
template:
metadata:
labels:
app: hello-kubernetes
spec:
setHostnameAsFQDN: true
hostname: hello
subdomain: world
containers:
- name: hello-kubernetes
image: redis
However, I am getting this error:
$ kubectl apply -f dep.yaml
error: error validating "dep.yaml": error validating data: ValidationError(Deployment.spec.template.spec): unknown field "setHostnameAsFQDN" in io.k8s.api.core.v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false
My kubectl version:
$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:59:43Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
After specifying --validate=falsee, hostname and hostname -f still return different values.
I believe I missunderstood something. Doc says that setHostnameAsFQDN will be available from kubernetes v1.20
You showed kubectl version. Your kubernetes version also need to be v1.20. Make sure you are using kubernetes version v1.20.
Use kubectl version for seeing both client and server version. Where client version refers to kubectl version and server version refers to kubernetes version.
As far the k8s v1.20 release note doc: Previously introduced in 1.19 behind a feature gate, SetHostnameAsFQDN is now enabled by default. More details on this behavior is available in documentation for DNS for Services and Pods

Getting git to work in a kubernetes container when using runAsUser

I'd like to run git as part of an initContainer in a Kubernetes pod. I'd also like the containers to run as an arbitrary non-root user. Is there any way to make this happen?
The problem is that if I include something like this in the pod description:
securityContext:
runAsUser: 1234
Then git will fail like this:
No user exists for uid 1234
fatal: Could not read from remote repository.
The error comes from ssh. The obvious workaround is to clone via an https:// url rather than using ssh, but I'm relying on a deploy key for read-only access to the remote repository.
I can't bake the user into the image because I don't know what the user id will be until runtime.
I can't add the user to /etc/passwd at runtime because the container is running as a non-root user.
How are other folks handling this sort of situation?
In case someone asks:
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.9+k3s1", GitCommit:"630bebf94b9dce6b8cd3d402644ed023b3af8f90", GitTreeState:"clean", BuildDate:"2020-09-17T19:05:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
You can use securityContext at container level instead of pod level.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000
initContainers:
- name: sec-ctx-in-init
image: busybox
securityContext:
runAsUser: 2000
command: ['sh', '-c', 'id']
containers:
- name: sec-ctx-in-container
image: gcr.io/google-samples/node-hello:1.0
securityContext:
runAsUser: 3000
allowPrivilegeEscalation: false
After creating the pod (I use katacoda), you can run:
master $ kubectl logs security-context-demo -c sec-ctx-in-init
uid=2000 gid=0(root)
master $ kubectl exec -it security-context-demo -c sec-ctx-in-container -- id
uid=3000 gid=0(root) groups=0(root)
Notice: if you are actually sharing files between container and initcontainer, you need to specify the same fsGroup at pod level securitycontext.

EKS Kubernetes user with RBAC seen as system:anonymous

I've been following this post to create user access to my kubernetes cluster (running on Amazon EKS). I did create key, csr, approved the request and downloaded the certificate for the user. Then I did create a kubeconfig file with the key and crt. When I run kubectl with this kubeconfig, I'm recognized as system:anonymous.
$ kubectl --kubeconfig test-user-2.kube.yaml get pods
Error from server (Forbidden): pods is forbidden: User "system:anonymous" cannot list pods in the namespace "default"
I expected the user to be recognized but get denied access.
$ kubectl --kubeconfig test-user-2.kube.yaml version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-18T11:37:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-28T20:13:43Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl --kubeconfig test-user-2.kube.yaml config view
apiVersion: v1
clusters:
- cluster:
insecure-skip-tls-verify: true
server: REDACTED
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: test-user-2
name: kubernetes
current-context: kubernetes
kind: Config
preferences: {}
users:
- name: test-user-2
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
# running with my other account (which uses heptio-authenticator-aws)
$ kubectl describe certificatesigningrequest.certificates.k8s.io/user-request-test-user-2
Name: user-request-test-user-2
Labels: <none>
Annotations: <none>
CreationTimestamp: Wed, 01 Aug 2018 15:20:15 +0200
Requesting User:
Status: Approved,Issued
Subject:
Common Name: test-user-2
Serial Number:
Events: <none>
I did create a ClusterRoleBinding with admin (also tried cluster-admin) roles for this user but that should not matter for this step. I'm not sure how I can further debug 1) if the user is created or not or 2) if I missed some configuration.
Any help is appreciated!
As mentioned in this article:
When you create an Amazon EKS cluster, the IAM entity user or role (for example, for federated users) that creates the cluster is automatically granted system:master permissions in the cluster's RBAC configuration. To grant additional AWS users or roles the ability to interact with your cluster, you must edit the aws-auth ConfigMap within Kubernetes.
Check if you have aws-auth ConfigMap applied to your cluster:
kubectl describe configmap -n kube-system aws-auth
If ConfigMap is present, skip this step and proceed to step 3.
If ConfigMap is not applied yet, you should do the following:
Download the stock ConfigMap:
curl -O https://amazon-eks.s3-us-west-2.amazonaws.com/1.10.3/2018-07-26/aws-auth-cm.yaml
Adjust it using your NodeInstanceRole ARN in the rolearn: . To get NodeInstanceRole value check out this manual and you will find it at steps 3.8 - 3.10.
data:
mapRoles: |
- rolearn: <ARN of instance role (not instance profile)>
Apply this config map to the cluster:
kubectl apply -f aws-auth-cm.yaml
Wait for cluster nodes becoming Ready:
kubectl get nodes --watch
Edit aws-auth ConfigMap and add users to it according to the example below:
kubectl edit -n kube-system configmap/aws-auth
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
mapRoles: |
- rolearn: arn:aws:iam::555555555555:role/devel-worker-nodes-NodeInstanceRole-74RF4UBDUKL6
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
mapUsers: |
- userarn: arn:aws:iam::555555555555:user/admin
username: admin
groups:
- system:masters
- userarn: arn:aws:iam::111122223333:user/ops-user
username: ops-user
groups:
- system:masters
Save and exit the editor.
Create kubeconfig for your IAM user following this manual.
I got this back from AWS support today.
Thanks for your patience. I have just heard back from the EKS team. They have confirmed that the aws-iam-authenticator has to be used with EKS and, because of that, it is not possible to authenticate using certificates.
I haven't heard whether this is expected to be supported in the future, but it is definitely broken at the moment.
This seems to be a limitation of EKS. Even though the CSR is approved, user can not authenticate. I used the same procedure on another kubernetes cluster and it worked fine.

My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log

This is what I keep getting:
[root#centos-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-h6nw8 1/1 Running 0 1h
nfs-web-07rxz 0/1 CrashLoopBackOff 8 16m
nfs-web-fdr9h 0/1 CrashLoopBackOff 8 16m
Below is output from describe pods
kubectl describe pods
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
16m 16m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-web-fdr9h to centos-minion-2
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Created Created container with docker id 495fcbb06836
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Started Started container with docker id 495fcbb06836
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Started Started container with docker id d56f34ae4e8f
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Created Created container with docker id d56f34ae4e8f
16m 16m 2 {kubelet centos-minion-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "web" with CrashLoopBackOff: "Back-off 10s restarting failed container=web pod=nfs-web-fdr9h_default(461c937d-d870-11e6-98de-005056040cc2)"
I have two pods: nfs-web-07rxz, nfs-web-fdr9h, but if I do kubectl logs nfs-web-07rxz or with -p option I don't see any log in both pods.
[root#centos-master ~]# kubectl logs nfs-web-07rxz -p
[root#centos-master ~]# kubectl logs nfs-web-07rxz
This is my replicationController yaml file:
replicationController yaml file
apiVersion: v1 kind: ReplicationController metadata: name: nfs-web spec: replicas: 2 selector:
role: web-frontend template:
metadata:
labels:
role: web-frontend
spec:
containers:
- name: web
image: eso-cmbu-docker.artifactory.eng.vmware.com/demo-container:demo-version3.0
ports:
- name: web
containerPort: 80
securityContext:
privileged: true
My Docker image was made from this simple docker file:
FROM ubuntu
RUN apt-get update
RUN apt-get install -y nginx
RUN apt-get install -y nfs-common
I am running my kubernetes cluster on CentOs-1611, kube version:
[root#centos-master ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
If I run the docker image by docker run I was able to run the image without any issue, only through kubernetes I got the crash.
Can someone help me out, how can I debug without seeing any log?
As #Sukumar commented, you need to have your Dockerfile have a Command to run or have your ReplicationController specify a command.
The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.
#Show details of specific pod
kubectl describe pod <pod name> -n <namespace-name>
# View logs for specific pod
kubectl logs <pod name> -n <namespace-name>
If you have an application that takes slower to bootstrap, it could be related to the initial values of the readiness/liveness probes. I solved my problem by increasing the value of initialDelaySeconds to 120s as my SpringBoot application deals with a lot of initialization. The documentation does not mention the default 0 (https://kubernetes.io/docs/api-reference/v1.9/#probe-v1-core)
service:
livenessProbe:
httpGet:
path: /health/local
scheme: HTTP
port: 8888
initialDelaySeconds: 120
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
readinessProbe:
httpGet:
path: /admin/health
scheme: HTTP
port: 8642
initialDelaySeconds: 150
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
A very good explanation about those values is given by What is the default value of initialDelaySeconds.
The health or readiness check algorithm works like:
wait for initialDelaySeconds
perform check and wait timeoutSeconds for a timeout
if the number of continued successes is greater than successThreshold return success
if the number of continued failures is greater than failureThreshold return failure otherwise wait periodSeconds and start a new check
In my case, my application can now bootstrap in a very clear way, so that I know I will not get periodic crashloopbackoff because sometimes it would be on the limit of those rates.
I had the need to keep a pod running for subsequent kubectl exec calls and as the comments above pointed out my pod was getting killed by my k8s cluster because it had completed running all its tasks. I managed to keep my pod running by simply kicking the pod with a command that would not stop automatically as in:
kubectl run YOUR_POD_NAME -n YOUR_NAMESPACE --image SOME_PUBLIC_IMAGE:latest --command tailf /dev/null
My pod kept crashing and I was unable to find the cause. Luckily there is a space where kubernetes saves all the events that occurred before my pod crashed.
(#List Events sorted by timestamp)
To see these events run the command:
kubectl get events --sort-by=.metadata.creationTimestamp
make sure to add a --namespace mynamespace argument to the command if needed
The events shown in the output of the command showed my why my pod kept crashing.
From This page, the container dies after running everything correctly but crashes because all the commands ended. Either you make your services run on the foreground, or you create a keep alive script. By doing so, Kubernetes will show that your application is running. We have to note that in the Docker environment, this problem is not encountered. It is only Kubernetes that wants a running app.
Update (an example):
Here's how to avoid CrashLoopBackOff, when launching a Netshoot container:
kubectl run netshoot --image nicolaka/netshoot -- sleep infinity
In your yaml file, add command and args lines:
...
containers:
- name: api
image: localhost:5000/image-name
command: [ "sleep" ]
args: [ "infinity" ]
...
Works for me.
I observed the same issue, and added the command and args block in yaml file. I am copying sample of my yaml file for reference
apiVersion: v1
kind: Pod
metadata:
labels:
run: ubuntu
name: ubuntu
namespace: default
spec:
containers:
- image: gcr.io/ow/hellokubernetes/ubuntu
imagePullPolicy: Never
name: ubuntu
resources:
requests:
cpu: 100m
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
dnsPolicy: ClusterFirst
enableServiceLinks: true
As mentioned in above posts, the container exits upon creation.
If you want to test this without using a yaml file, you can pass the sleep command to the kubectl create deployment statement. The double hyphen -- indicates a command, which is equivalent of command: in a Pod or Deployment yaml file.
The below command creates a deployment for debian with sleep 1234, so it doesn't exit immediately.
kubectl create deployment deb --image=debian:buster-slim -- "sh" "-c" "while true; do sleep 1234; done"
You then can create a service etc, or, to test the container, you can kubectl exec -it <pod-name> -- sh (or -- bash) into the container you just created to test it.
I solved this problem I increased memory resource
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 100m
memory: 250Mi
In my case the problem was what Steve S. mentioned:
The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.
Namely I had a Java application whose main threw an exception (and something overrode the default uncaught exception handler so that nothing was logged). The solution was to put the body of main into try { ... } catch and print out the exception. Thus I could find out what was wrong and fix it.
(Another cause could be something in the app calling System.exit; you could use a custom SecurityManager with an overridden checkExit to prevent (or log the caller of) exit; see https://stackoverflow.com/a/5401319/204205.)
Whilst troubleshooting the same issue I found no logs when using kubeclt logs <pod_id>.
Therefore I ssh:ed in to the node instance to try to run the container using plain docker. To my surprise this failed also.
When entering the container with:
docker exec -it faulty:latest /bin/sh
and poking around I found that it wasn't the latest version.
A faulty version of the docker image was already available on the instance.
When I removed the faulty:latest instance with:
docker rmi faulty:latest
everything started to work.
I had same issue and now I finally resolved it. I am not using docker-compose file.
I just added this line in my Docker file and it worked.
ENV CI=true
Reference:
https://github.com/GoogleContainerTools/skaffold/issues/3882
Try rerunning the pod and running
kubectl get pods --watch
to watch the status of the pod as it progresses.
In my case, I would only see the end result, 'CrashLoopBackOff,' but the docker container ran fine locally. So I watched the pods using the above command, and I saw the container briefly progress into an OOMKilled state, which meant to me that it required more memory.
In my case this error was specific to the hello-world docker image. I used the nginx image instead of the hello-world image and the error was resolved.
i solved this problem by removing space between quotes and command value inside of array ,this is happened because container exited after started and no executable command present which to be run inside of container.
['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']
I had similar issue but got solved when I corrected my zookeeper.yaml file which had the service name mismatch with file deployment's container names. It got resolved by making them same.
apiVersion: v1
kind: Service
metadata:
name: zk1
namespace: nbd-mlbpoc-lab
labels:
app: zk-1
spec:
ports:
- name: client
port: 2181
protocol: TCP
- name: follower
port: 2888
protocol: TCP
- name: leader
port: 3888
protocol: TCP
selector:
app: zk-1
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: zk-deployment
namespace: nbd-mlbpoc-lab
spec:
template:
metadata:
labels:
app: zk-1
spec:
containers:
- name: zk1
image: digitalwonderland/zookeeper
ports:
- containerPort: 2181
env:
- name: ZOOKEEPER_ID
value: "1"
- name: ZOOKEEPER_SERVER_1
value: zk1
In my case, the issue was a misconstrued list of command-line arguments. I was doing this in my deployment file:
...
args:
- "--foo 10"
- "--bar 100"
Instead of the correct approach:
...
args:
- "--foo"
- "10"
- "--bar"
- "100"
I finally found the solution when I execute 'docker run xxx ' command ,and I got the error then.It is caused by incomplete-platform .
It seems there could be a lot of reasons why a Pod should be in crashloopbackoff state.
In my case, one of the container was terminating continuously due to the missing Environment value.
So, the best way to debug is to -
1. check Pod description output i.e. kubectl describe pod abcxxx
2. check the events generated related to the Pod i.e. kubectl get events| grep abcxxx
3. Check if End-points have been created for the Pod i.e. kubectl get ep
4. Check if dependent resources have been in-place e.g. CRDs or configmaps or any other resource that may be required.
kubectl logs -f POD, will only produce logs from a running container. Suffix --previous to the command to get logs from a previous container. Used maily for debugging. Hope this helps.