Whitelisting sysctl parameters for helm chart - kubernetes

I have a helm chart that deploys an app but also needs to reconfigure some sysctl parameters in order to run properly. When I install the helm chart and run kubectl describe pod/pod_name on the pod that was deployed, I get forbidden sysctl: "kernel.sem" not whitelisted. I have added a podsecuritypolicy like so but with no such luck.
apiVersion:policy/v1beta1
kind:PodSecurityPolicy
metadata:
name: policy
spec:
allowedUnsafeSysctls:
- kernel.sem
- kernel.shmmax
- kernel.shmall
- fs.mqueue.msg_max
seLinux:
rule: 'RunAsAny'
runAsUser:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule:'RunAsAny'
---UPDATE---
I also try to set the kubelet parameters via a config file in order to allow-unsafe-ctls but I get an error no kind "KubeletConfiguration" is registered for version "kubelet.config.k8s.io/v1beta1".
Here's the configuration file:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
allowedUnsafeSysctls:
- "kernel.sem"
- "kernel.shmmax"
- "kernel.shmall"
- "fs.mqueue.msg_max"

The kernel.sem sysctl is considered as unsafe sysctl, therefore is disabled by default (only safe sysctls are enabled by default). You can allow one or more unsafe sysctls on a node-by-node basics, to do so you need to add --allowed-unsafe-sysctls flag to the kubelet.
Look at "Enabling Unsafe Sysctls"
I've created simple example to illustrate you how it works.
First I added --allowed-unsafe-sysctls flag to the kubelet.
In my case I use kubeadm, so I need to add this flag to
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf file:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --allowed-unsafe-sysctls=kernel.sem"
...
NOTE: You have to add this flag on every node you want to run Pod with kernel.sem enabled.
Then I reloaded systemd manager configuration and restarted kubelet using below command:
# systemctl daemon-reload && systemctl restart kubelet
Next I created a simple Pod using this manifest file:
apiVersion: v1
kind: Pod
metadata:
labels:
run: web
name: web
spec:
securityContext:
sysctls:
- name: kernel.sem
value: "250 32000 100 128"
containers:
- image: nginx
name: web
Finally we can check if it works correctly:
# sysctl -a | grep "kernel.sem"
kernel.sem = 32000 1024000000 500 32000 // on the worker node
# kubectl get pod
NAME READY STATUS RESTARTS AGE
web 1/1 Running 0 110s
# kubectl exec -it web -- bash
root#web:/# cat /proc/sys/kernel/sem
250 32000 100 128 // inside the Pod
Your PodSecurityPolicy doesn't work as expected, because of as you can see in the documentation:
Warning: If you allow unsafe sysctls via the allowedUnsafeSysctls field in a PodSecurityPolicy, any pod using such a sysctl will fail to start if the sysctl is not allowed via the --allowed-unsafe-sysctls kubelet flag as well on that node.

Related

how to restrict a pod to connect only to 2 pods using networkpolicy and test connection in k8s in simple way?

Do I still need to expose pod via clusterip service?
There are 3 pods - main, front, api. I need to allow ingress+egress connection to main pod only from the pods- api and frontend. I also created service-main - service that exposes main pod on port:80.
I don't know how to test it, tried:
k exec main -it -- sh
netcan -z -v -w 5 service-main 80
and
k exec main -it -- sh
curl front:80
The main.yaml pod:
apiVersion: v1
kind: Pod
metadata:
labels:
app: main
item: c18
name: main
spec:
containers:
- image: busybox
name: main
command:
- /bin/sh
- -c
- sleep 1d
The front.yaml:
apiVersion: v1
kind: Pod
metadata:
labels:
app: front
name: front
spec:
containers:
- image: busybox
name: front
command:
- /bin/sh
- -c
- sleep 1d
The api.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
app: api
name: api
spec:
containers:
- image: busybox
name: api
command:
- /bin/sh
- -c
- sleep 1d
The main-to-front-networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: front-end-policy
spec:
podSelector:
matchLabels:
app: main
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: front
ports:
- port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: front
ports:
- port: 8080
What am I doing wrong? Do I still need to expose main pod via service? But should not network policy take care of this already?
Also, do I need to write containerPort:80 in main pod? How to test connectivity and ensure ingress-egress works only for main pod to api, front pods?
I tried the lab from ckad prep course, it had 2 pods: secure-pod and web-pod. There was issue with connectivity, the solution was to create network policy and test using netcat from inside the web-pod's container:
k exec web-pod -it -- sh
nc -z -v -w 1 secure-service 80
connection open
UPDATE: ideally I want answers to these:
a clear explanation of the diff btw service and networkpolicy.
If both service and netpol exist - what is the order of evaluation that the traffic/request goes thru? It first goes thru netpol then service? Or vice versa?
if I want front and api pods to send/receive traffic to main - do I need separate services exposing front and api pods?
Network policies and services are two different and independent Kubernetes resources.
Service is:
An abstract way to expose an application running on a set of Pods as a network service.
Good explanation from the Kubernetes docs:
Kubernetes Pods are created and destroyed to match the state of your cluster. Pods are nonpermanent resources. If you use a Deployment to run your app, it can create and destroy Pods dynamically.
Each Pod gets its own IP address, however in a Deployment, the set of Pods running in one moment in time could be different from the set of Pods running that application a moment later.
This leads to a problem: if some set of Pods (call them "backends") provides functionality to other Pods (call them "frontends") inside your cluster, how do the frontends find out and keep track of which IP address to connect to, so that the frontend can use the backend part of the workload?
Enter Services.
Also another good explanation in this answer.
For production you should use a workload resources instead of creating pods directly:
Pods are generally not created directly and are created using workload resources. See Working with Pods for more information on how Pods are used with workload resources.
Here are some examples of workload resources that manage one or more Pods:
Deployment
StatefulSet
DaemonSet
And use services to make requests to your application.
Network policies are used to control traffic flow:
If you want to control traffic flow at the IP address or port level (OSI layer 3 or 4), then you might consider using Kubernetes NetworkPolicies for particular applications in your cluster.
Network policies target pods, not services (an abstraction). Check this answer and this one.
Regarding your examples - your network policy is correct (as I tested it below). The problem is that your cluster may not be compatible:
For Network Policies to take effect, your cluster needs to run a network plugin which also enforces them. Project Calico or Cilium are plugins that do so. This is not the default when creating a cluster!
Test on kubeadm cluster with Calico plugin -> I created similar pods as you did, but I changed container part:
spec:
containers:
- name: main
image: nginx
command: ["/bin/sh","-c"]
args: ["sed -i 's/listen .*/listen 8080;/g' /etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"]
ports:
- containerPort: 8080
So NGINX app is available at the 8080 port.
Let's check pods IP:
user#shell:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
api 1/1 Running 0 48m 192.168.156.61 example-ubuntu-kubeadm-template-2 <none> <none>
front 1/1 Running 0 48m 192.168.156.56 example-ubuntu-kubeadm-template-2 <none> <none>
main 1/1 Running 0 48m 192.168.156.52 example-ubuntu-kubeadm-template-2 <none> <none>
Let's exec into running main pod and try to make request to the front pod:
root#main:/# curl 192.168.156.61:8080
<!DOCTYPE html>
...
<title>Welcome to nginx!</title>
It is working.
After applying your network policy:
user#shell:~$ kubectl apply -f main-to-front.yaml
networkpolicy.networking.k8s.io/front-end-policy created
user#shell:~$ kubectl exec -it main -- bash
root#main:/# curl 192.168.156.61:8080
...
Not working anymore, so it means that network policy is applied successfully.
Nice option to get more information about applied network policy is to run kubectl describe command:
user#shell:~$ kubectl describe networkpolicy front-end-policy
Name: front-end-policy
Namespace: default
Created on: 2022-01-26 15:17:58 +0000 UTC
Labels: <none>
Annotations: <none>
Spec:
PodSelector: app=main
Allowing ingress traffic:
To Port: 8080/TCP
From:
PodSelector: app=front
Allowing egress traffic:
To Port: 8080/TCP
To:
PodSelector: app=front
Policy Types: Ingress, Egress

How to access kube-scheduler on a kubernetes cluster?

I'm trying to figure out how to configure the kubernetes scheduler using a custom config but I'm having a bit of trouble understanding exactly how the scheduler is accessible.
The scheduler runs as a pod under the kube-system namespace called kube-scheduler-it-k8s-master. The documentation says that you can configure the scheduler by creating a config file and calling kube-scheduler --config <filename>. However I am not able to access the scheduler container directly as running kubectl exec -it kube-scheduler-it-k8s-master -- /bin/bash returns:
OCI runtime exec failed: exec failed: container_linux.go:370: starting container process caused: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
command terminated with exit code 126
I tried modifying /etc/kubernetes/manifests/kube-scheduler to mount my custom config file within the pod and explicitly call kube-scheduler with the --config option set, but it seems that my changes get reverted and the scheduler runs using the default settings.
I feel like I'm misunderstanding something fundamentally about the kubernetes scheduler. Am I supposed to pass in the custom scheduler config from within the scheduler pod itself? Or is this supposed to be done remotely somehow?
Thanks!
Since your X problem is "how to modify scheduler configuration", you can try the following for it.
Using kubeadm
If you are using kubeadm to bootstrap the cluster, you can use --config flag while running kubeadm init to pass a custom configuration object of type ClusterConfiguration to pass extra arguments to control plane components.
Example config for scheduler:
$ cat sched.conf
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.0
scheduler:
extraArgs:
address: 0.0.0.0
config: /home/johndoe/schedconfig.yaml
kubeconfig: /home/johndoe/kubeconfig.yaml
$ kubeadm init --config sched.conf
You could also try kubeadm upgrade apply --config sched.conf <k8s version> to apply updated config on a live cluster.
Reference: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/
Updating static pod manifest
You could also edit /etc/kubernetes/manifests/kube-scheduler.yaml, modify the flags to pass the config. Make sure you mount the file into the pod by updating volumes and volumeMounts section.
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
- --config=/etc/kubernetes/mycustomconfig.conf
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/mycustomconfig.conf
name: customconfig
readOnly: true
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/mycustomconfig.conf
type: FileOrCreate
name: customconfig

Applying API Server App ID to k8s cluster spec

Team,
I already have a cluster running and I need to update the OIDC value. is there a way I can do it without having to recreate the cluster?
ex: below is my cluster info and I need to update the oidcClientID: spn:
How can I do this as I have 5 masters running?
kubeAPIServer:
storageBackend: etcd3
oidcClientID: spn:45645hhh-f641-498d-b11a-1321231231
oidcUsernameClaim: upn
oidcUsernamePrefix: "oidc:"
oidcGroupsClaim: groups
oidcGroupsPrefix: "oidc:"
You update your kube-apiserver on your masters one by one (update/restart). If your cluster is setup correctly, when you get to the active kube-apiserver it should automatically failover to another kube-apiserver master in standby.
You can add the oidc options in the /etc/kubernetes/manifests/kube-apiserver.yaml pod manifest file.
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --authorization-mode=Node,RBAC
- --advertise-address=172.x.x.x
- --allow-privileged=true
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --oidc-client-id=...
- --oidc-username-claim=...
- --oidc-username-prefix=...
- --oidc-groups-claim=...
- --oidc-groups-prefix=...
...
Then you can restart your kube-apiserver container, if you are using docker:
$ sudo docker restart <container-id-for-kube-apiserver>
Or if you'd like to restart all the components on the master:
$ sudo systemctl restart docker
Watch for logs on the kube-apiserver container
$ sudo docker logs -f <container-id-for-kube-apiserver>
Make sure you never have less running nodes than your quorum which should be 3 for your 5 master cluster, to be safe. If for some reason your etcd cluster falls out of quorum you will have to recover by recreating the etcd cluster and restoring from a backup.

Using sysctls in Google Kubernetes Engine (GKE)

I'm running a k8s cluster - 1.9.4-gke.1 - on Google Kubernetes Engine (GKE).
I need to set sysctl net.core.somaxconn to a higher value inside some containers.
I've found this official k8s page: Using Sysctls in a Kubernetes Cluster - that seemed to solve my problem. The solution was to make an annotation on my pod spec like the following:
annotations:
security.alpha.kubernetes.io/sysctls: net.core.somaxconn=1024
But when I tried to create my pod:
Status: Failed
Reason: SysctlForbidden
Message: Pod forbidden sysctl: "net.core.somaxconn" not whitelisted
So I've tried to create a PodSecurityPolicy like the following:
---
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
name: sites-psp
annotations:
security.alpha.kubernetes.io/sysctls: 'net.core.somaxconn'
spec:
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- '*'
... but it didn't work either.
I've also found that I can use a kubelet argument on every node to whitelist the specific sysctl: --experimental-allowed-unsafe-sysctls=net.core.somaxconn
I've added this argument to the KUBELET_TEST_ARGS setting on my GCE machine and restarted it. From what I can see from the output of ps command, it seems that the option was successfully added to the kubelet process on the startup:
/home/kubernetes/bin/kubelet --v=2 --kube-reserved=cpu=60m,memory=960Mi --experimental-allowed-unsafe-sysctls=net.core.somaxconn --allow-privileged=true --cgroup-root=/ --cloud-provider=gce --cluster-dns=10.51.240.10 --cluster-domain=cluster.local --pod-manifest-path=/etc/kubernetes/manifests --experimental-mounter-path=/home/kubernetes/containerized_mounter/mounter --experimental-check-node-capabilities-before-mount=true --cert-dir=/var/lib/kubelet/pki/ --enable-debugging-handlers=true --bootstrap-kubeconfig=/var/lib/kubelet/bootstrap-kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --anonymous-auth=false --authorization-mode=Webhook --client-ca-file=/etc/srv/kubernetes/pki/ca-certificates.crt --cni-bin-dir=/home/kubernetes/bin --network-plugin=kubenet --volume-plugin-dir=/home/kubernetes/flexvolume --node-labels=beta.kubernetes.io/fluentd-ds-ready=true,cloud.google.com/gke-nodepool=temp-pool --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5% --feature-gates=ExperimentalCriticalPodAnnotation=true
The problem is that I keep receiving a message telling me that my pod cannot be started because sysctl net.core.somaxconn is not whitelisted.
Is there some limitation on GKE so that I cannot whitelist a sysctl? Am I doing something wrong?
Until sysctl support becomes better integrated you can put this in your pod spec
spec:
initContainers:
- name: sysctl-buddy
image: busybox:1.29
securityContext:
privileged: true
command: ["/bin/sh"]
args:
- -c
- sysctl -w net.core.somaxconn=4096 vm.overcommit_memory=1
resources:
requests:
cpu: 1m
memory: 1Mi
This is an intentional Kubernetes limitation. There is an open PR to add net.core.somaxconn to the whitelist here: https://github.com/kubernetes/kubernetes/pull/54896
As far as I know, there isn't a way to override this behavior on GKE.

My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log

This is what I keep getting:
[root#centos-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-h6nw8 1/1 Running 0 1h
nfs-web-07rxz 0/1 CrashLoopBackOff 8 16m
nfs-web-fdr9h 0/1 CrashLoopBackOff 8 16m
Below is output from describe pods
kubectl describe pods
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
16m 16m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-web-fdr9h to centos-minion-2
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Created Created container with docker id 495fcbb06836
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Started Started container with docker id 495fcbb06836
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Started Started container with docker id d56f34ae4e8f
16m 16m 1 {kubelet centos-minion-2} spec.containers{web} Normal Created Created container with docker id d56f34ae4e8f
16m 16m 2 {kubelet centos-minion-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "web" with CrashLoopBackOff: "Back-off 10s restarting failed container=web pod=nfs-web-fdr9h_default(461c937d-d870-11e6-98de-005056040cc2)"
I have two pods: nfs-web-07rxz, nfs-web-fdr9h, but if I do kubectl logs nfs-web-07rxz or with -p option I don't see any log in both pods.
[root#centos-master ~]# kubectl logs nfs-web-07rxz -p
[root#centos-master ~]# kubectl logs nfs-web-07rxz
This is my replicationController yaml file:
replicationController yaml file
apiVersion: v1 kind: ReplicationController metadata: name: nfs-web spec: replicas: 2 selector:
role: web-frontend template:
metadata:
labels:
role: web-frontend
spec:
containers:
- name: web
image: eso-cmbu-docker.artifactory.eng.vmware.com/demo-container:demo-version3.0
ports:
- name: web
containerPort: 80
securityContext:
privileged: true
My Docker image was made from this simple docker file:
FROM ubuntu
RUN apt-get update
RUN apt-get install -y nginx
RUN apt-get install -y nfs-common
I am running my kubernetes cluster on CentOs-1611, kube version:
[root#centos-master ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.0", GitCommit:"86dc49aa137175378ac7fba7751c3d3e7f18e5fc", GitTreeState:"clean", BuildDate:"2016-12-15T16:57:18Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
If I run the docker image by docker run I was able to run the image without any issue, only through kubernetes I got the crash.
Can someone help me out, how can I debug without seeing any log?
As #Sukumar commented, you need to have your Dockerfile have a Command to run or have your ReplicationController specify a command.
The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.
#Show details of specific pod
kubectl describe pod <pod name> -n <namespace-name>
# View logs for specific pod
kubectl logs <pod name> -n <namespace-name>
If you have an application that takes slower to bootstrap, it could be related to the initial values of the readiness/liveness probes. I solved my problem by increasing the value of initialDelaySeconds to 120s as my SpringBoot application deals with a lot of initialization. The documentation does not mention the default 0 (https://kubernetes.io/docs/api-reference/v1.9/#probe-v1-core)
service:
livenessProbe:
httpGet:
path: /health/local
scheme: HTTP
port: 8888
initialDelaySeconds: 120
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
readinessProbe:
httpGet:
path: /admin/health
scheme: HTTP
port: 8642
initialDelaySeconds: 150
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 10
A very good explanation about those values is given by What is the default value of initialDelaySeconds.
The health or readiness check algorithm works like:
wait for initialDelaySeconds
perform check and wait timeoutSeconds for a timeout
if the number of continued successes is greater than successThreshold return success
if the number of continued failures is greater than failureThreshold return failure otherwise wait periodSeconds and start a new check
In my case, my application can now bootstrap in a very clear way, so that I know I will not get periodic crashloopbackoff because sometimes it would be on the limit of those rates.
I had the need to keep a pod running for subsequent kubectl exec calls and as the comments above pointed out my pod was getting killed by my k8s cluster because it had completed running all its tasks. I managed to keep my pod running by simply kicking the pod with a command that would not stop automatically as in:
kubectl run YOUR_POD_NAME -n YOUR_NAMESPACE --image SOME_PUBLIC_IMAGE:latest --command tailf /dev/null
My pod kept crashing and I was unable to find the cause. Luckily there is a space where kubernetes saves all the events that occurred before my pod crashed.
(#List Events sorted by timestamp)
To see these events run the command:
kubectl get events --sort-by=.metadata.creationTimestamp
make sure to add a --namespace mynamespace argument to the command if needed
The events shown in the output of the command showed my why my pod kept crashing.
From This page, the container dies after running everything correctly but crashes because all the commands ended. Either you make your services run on the foreground, or you create a keep alive script. By doing so, Kubernetes will show that your application is running. We have to note that in the Docker environment, this problem is not encountered. It is only Kubernetes that wants a running app.
Update (an example):
Here's how to avoid CrashLoopBackOff, when launching a Netshoot container:
kubectl run netshoot --image nicolaka/netshoot -- sleep infinity
In your yaml file, add command and args lines:
...
containers:
- name: api
image: localhost:5000/image-name
command: [ "sleep" ]
args: [ "infinity" ]
...
Works for me.
I observed the same issue, and added the command and args block in yaml file. I am copying sample of my yaml file for reference
apiVersion: v1
kind: Pod
metadata:
labels:
run: ubuntu
name: ubuntu
namespace: default
spec:
containers:
- image: gcr.io/ow/hellokubernetes/ubuntu
imagePullPolicy: Never
name: ubuntu
resources:
requests:
cpu: 100m
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
dnsPolicy: ClusterFirst
enableServiceLinks: true
As mentioned in above posts, the container exits upon creation.
If you want to test this without using a yaml file, you can pass the sleep command to the kubectl create deployment statement. The double hyphen -- indicates a command, which is equivalent of command: in a Pod or Deployment yaml file.
The below command creates a deployment for debian with sleep 1234, so it doesn't exit immediately.
kubectl create deployment deb --image=debian:buster-slim -- "sh" "-c" "while true; do sleep 1234; done"
You then can create a service etc, or, to test the container, you can kubectl exec -it <pod-name> -- sh (or -- bash) into the container you just created to test it.
I solved this problem I increased memory resource
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 100m
memory: 250Mi
In my case the problem was what Steve S. mentioned:
The pod is crashing because it starts up then immediately exits, thus Kubernetes restarts and the cycle continues.
Namely I had a Java application whose main threw an exception (and something overrode the default uncaught exception handler so that nothing was logged). The solution was to put the body of main into try { ... } catch and print out the exception. Thus I could find out what was wrong and fix it.
(Another cause could be something in the app calling System.exit; you could use a custom SecurityManager with an overridden checkExit to prevent (or log the caller of) exit; see https://stackoverflow.com/a/5401319/204205.)
Whilst troubleshooting the same issue I found no logs when using kubeclt logs <pod_id>.
Therefore I ssh:ed in to the node instance to try to run the container using plain docker. To my surprise this failed also.
When entering the container with:
docker exec -it faulty:latest /bin/sh
and poking around I found that it wasn't the latest version.
A faulty version of the docker image was already available on the instance.
When I removed the faulty:latest instance with:
docker rmi faulty:latest
everything started to work.
I had same issue and now I finally resolved it. I am not using docker-compose file.
I just added this line in my Docker file and it worked.
ENV CI=true
Reference:
https://github.com/GoogleContainerTools/skaffold/issues/3882
Try rerunning the pod and running
kubectl get pods --watch
to watch the status of the pod as it progresses.
In my case, I would only see the end result, 'CrashLoopBackOff,' but the docker container ran fine locally. So I watched the pods using the above command, and I saw the container briefly progress into an OOMKilled state, which meant to me that it required more memory.
In my case this error was specific to the hello-world docker image. I used the nginx image instead of the hello-world image and the error was resolved.
i solved this problem by removing space between quotes and command value inside of array ,this is happened because container exited after started and no executable command present which to be run inside of container.
['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']
I had similar issue but got solved when I corrected my zookeeper.yaml file which had the service name mismatch with file deployment's container names. It got resolved by making them same.
apiVersion: v1
kind: Service
metadata:
name: zk1
namespace: nbd-mlbpoc-lab
labels:
app: zk-1
spec:
ports:
- name: client
port: 2181
protocol: TCP
- name: follower
port: 2888
protocol: TCP
- name: leader
port: 3888
protocol: TCP
selector:
app: zk-1
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: zk-deployment
namespace: nbd-mlbpoc-lab
spec:
template:
metadata:
labels:
app: zk-1
spec:
containers:
- name: zk1
image: digitalwonderland/zookeeper
ports:
- containerPort: 2181
env:
- name: ZOOKEEPER_ID
value: "1"
- name: ZOOKEEPER_SERVER_1
value: zk1
In my case, the issue was a misconstrued list of command-line arguments. I was doing this in my deployment file:
...
args:
- "--foo 10"
- "--bar 100"
Instead of the correct approach:
...
args:
- "--foo"
- "10"
- "--bar"
- "100"
I finally found the solution when I execute 'docker run xxx ' command ,and I got the error then.It is caused by incomplete-platform .
It seems there could be a lot of reasons why a Pod should be in crashloopbackoff state.
In my case, one of the container was terminating continuously due to the missing Environment value.
So, the best way to debug is to -
1. check Pod description output i.e. kubectl describe pod abcxxx
2. check the events generated related to the Pod i.e. kubectl get events| grep abcxxx
3. Check if End-points have been created for the Pod i.e. kubectl get ep
4. Check if dependent resources have been in-place e.g. CRDs or configmaps or any other resource that may be required.
kubectl logs -f POD, will only produce logs from a running container. Suffix --previous to the command to get logs from a previous container. Used maily for debugging. Hope this helps.