Regarding Scheduling of K8S - scheduler

Regarding the kubelet status and the policy of kube-scheduler.
The status of kubelet running on my eight workers is ready at the time when i did spawning to create container in RC.
The given message by scheduler is that the RC was well scheduled on across all eight worker nodes. but, the pod status is pending.
i waited as much enough for downloading image but the state didn't changed to running. so, i restarted kubelet service on a worker where having the pending pod. then all pod's pending state had changed running state.
Scheduled well(pod) -> pending(pod) -> restart kubelet -> running(pod)
why it was resolved after restart kubelet?
The log(kubelet) looks like below.
factory.go:71] Error trying to work out if we can handle /docker-daemon/docker: error inspecting container: No such container: docker
factory.go:71] Error trying to work out if we can handle /docker: error inspecting container: No such container: docker
factory.go:71] Error trying to work out if we can handle /: error inspecting container: unexpected end of JSON input
factory.go:71] Error trying to work out if we can handle /docker-daemon: error inspecting container: No such container: docker-daemon
factory.go:71] Error trying to work out if we can handle /kubelet: error inspecting container: No such container: kubelet
factory.go:71] Error trying to work out if we can handle /kube-proxy: error inspecting container: No such container: kube-proxy
Another symptom is with below picture.
The scheduled pod works well. but the condition is false middle in picture
(got from "kubectl describe ~~")
working well but false...what the false means?
Thanks

Related

My Pods getting SIGTERM and exited gracefully as part of signalhandler but unable to find root cause of why SIGTERM sent from kubelet to my pods?

My Pods getting SIGTERM automatically for unknown reason. Unable to find root cause of why SIGTERM sent from kubelet to my pods is the concern to fix issue.
When I ran kubectl describe podname -n namespace, under events section Only killing event is present. I didn't see any unhealthy status before kill event.
Is there any way to debug further with events of pods or any specific log files where we can find trace of reason for sending SIGTERM?
I tried to do kubectl describe on events(killing)but it seems no such command to drill down events further.
Any other approach to debug this issue is appreciated.Thanks in advance!
kubectl desribe pods snippet
Please can you share the yaml of your deployment so we can try to replicate your problem.
Based on your attached screenshot, it looks like your readiness probe failed to complete repeatedly (it didn't run and fail, it failed to complete entirely), and therefore the cluster killed it.
Without knowing what your docker image is doing makes it hard to debug from here.
As a first point of debugging, you can try doing kubectl logs -f -n {namespace} {pod-name} to see what the pod is doing and seeing if it's erroring there.
The error Client.Timeout exceeded while waiting for headers implies your container is proxying something? So perhaps what you're trying to proxy upstream isn't responding.

OKD unable to pull lager images from Internal Registry right after deployment of microservices through Jenkinsx

I am trying to deploy micro services in OKD through Jenkinsx and the deployment is successful every time.
But the Pods are going into "ImagePullBackOff" error right after deployment and comes into Running state after deleting the pods.
ImagePullBackOff Error:
Events:
The images are being pulled from the OKD's internal registry and the image is of size "1.25 GB". And the images are available in the Internal Registry when the pod is trying to pull it.
I came across "image-pull-progress-deadline" field to be updated in the "/etc/origin/node/node-config.yaml" in all the nodes. Updated the same in all the nodes but still facing the same "ImagePullBackOff" error.
I am trying to restart the kubelet service but that fails with kubelet.service not found error,
[master ~]$ sudo systemctl status kubelet
Unit kubelet.service could not be found.
Please let me know if restart of kubelet service is necessary and any suggestions to resolve the "ImagePullBackOff" issue.

fluentd daemon set container for papertrail failing to start in kubernetes cluster

Am trying to setup fluentd in kubernetes cluster to aggregate logs in papertrail, as per the documentation provided here.
The configuration file is fluentd-daemonset-papertrail.yaml
It basically creates a daemon set for fluentd container and a config map for fluentd configuration.
When I apply the configuration, the pod is assigned to a node and the container is created. However, its either not completing the initialization or the pod gets killed immediately after it is started.
As the pods are getting killed, am loosing the logs too. Couldn't investigate the cause of the issue.
Looking through the events for kube-system namespace has below errors,
Error: failed to start container "fluentd": Error response from daemon: OCI runtime create failed: container_linux.go:338: creating new parent process caused "container_linux.go:1897: running lstat on namespace path \"/proc/75026/ns/ipc\" caused \"lstat /proc/75026/ns/ipc: no such file or directory\"": unknown
Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9559643bf77e29d270c23bddbb17a9480ff126b0b6be10ba480b558a0733161c" network for pod "fluentd-papertrail-b9t5b": NetworkPlugin kubenet failed to set up pod "fluentd-papertrail-b9t5b_kube-system" network: Error adding container to network: failed to open netns "/proc/111610/ns/net": failed to Statfs "/proc/111610/ns/net": no such file or directory
Am not sure whats causing these errors. Appreciate any help to understand and troubleshoot these errors.
Also, is it possible to look at logs/events that could tell us why a pod is given a terminate signal?
Please ensure that /etc/cni/net.d and its /opt/cni/bin friend both exist and are correctly populated with the CNI configuration files and binaries on all Nodes.
Take a look: sandbox.
With help from papertrail support team, I was able to resolve the issue by removing below entry from manifest file.
kubernetes.io/cluster-service: "true"
Above annotation seems to have been deprecated.
Relevant github issues:
https://github.com/fluent/fluentd-kubernetes-daemonset/issues/296
https://github.com/kubernetes/kubernetes/issues/72757

Broken parameters persisting in Deis deployments

An invalid command parameter got into the deployment for a worker process in a Deis app. Now whenever I run a deis pull for a new image this broken parameter gets passed to the deployment so the worker doesn't start up successfully.
If I go into kubectl I can see the following parameter being set in the deployment for the worker (path /spec/template/spec/containers/0)
"command": [
"/bin/bash",
"-c"
],
Which results in the pod not starting up properly:
Error: failed to start container "worker": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"
Error syncing pod
Back-off restarting failed container
This means that for every release/pull I've been going in and manually removing that parameter from the worker deployment setup. I've run kubectl delete deployment and recreated it with valid json (kubectl create -f deployment.json). This fixes things until I run deis pull again, at which point the broken parameter is back.
My thinking is that that broken command parameter is persisted somewhere in the deis database or the like and that it's being reset when I run deis pull.
I've tried the troubleshooting guide and dug around in the deis-database but I can't find where the deployment for the worker process is being created or where the deployment parameters that get passed to kubernetes when you run a deis pull come from.
Running deis v2.10.0 on Google Cloud

Pod Containers Keeps on Restarting

Container getting killed at node after pod creation
Issue was raised at github and asked me to move to SO
https://github.com/kubernetes/kubernetes/issues/24241
However i am briefing my issue here.After creating pod it doesnt run since i have to mention the container name in the kubelet args under --pod-infra-container-image as mentioned below.
I have solved the issue of Pods Status Container Creating by adding the container name in "--pod-infra-container-image= then pod creation was successful.
However I want to resolve this issue some other way instead of adding containers name in kubelet args. Kindly let me know how do I get this issue fixed.
Also after the pod creation is done. The containers keep on restarting. However if I check the logs via kubectl logs output shows the container expected output.
But the container restarts often. For restarting of pod what i did i have made the restartPolicy: never in spec file of pod and then it didnt restarted however container doesnt run. Kindly help me.
Your description is very confusing, can you please reply to this answer with:
1. What error you get when you do docker pull gcr.io/google_containers/pause:2.0
2. What cmd you're running in your container
For 2 you need a long running command, like while true; do echo SUCCESS; done, otherwise it'll just exit and get restarted by the kubelet with a RestartPolicy of Always.