Kubernetes (MicroK8S) pod stuck in ContainerCreating state ("Pulling image") - kubernetes

In some cases after a node reboot the Kubernetes cluster managed by MicroK8S cannot start pods.
With a describe of the pod failing to be ready I could see that the pod was stuck in a "Pulling image" state during several minutes without any other event, as following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50s default-scheduler Successfully assigned default/my-service-abc to my-desktop
Normal Pulling 8m kubelet Pulling image "127.0.0.1:32000/some-service"
Pulling from the node with docker pull 127.0.0.1:32000/some-service works perfectly, so it doesn't seem a problem with Docker.
I have upgraded Docker just in case.
I seem to run the latest version of microk8s.

Running a sudo microk8s inspect gives no error/warning, everything seemed to work perfectly.
As a Docker pull did work locally, it's really the Kubelet app which is communicating with Docker that seemed stuck.
Even with a sudo service docker stop && sudo service docker start it did not work.
Even a rollout did not suffice to get out of the Pulling state after the reboot of docker.
Worst of all, a reboot of the server did not change anything (the pod that were up were still working, but all the other pods (70%) were down and in ContainerCreating state.
Checking the status systemctl status snap.microk8s.daemon-kubelet did not report any error.
The only thing that worked seemed to do this:
sudo systemctl reboot snap.microk8s.daemon-kubelet
However, it also rebooted the whole node, so that's something to do in a last case scenario (same as the node reboot).

Related

OKD unable to pull lager images from Internal Registry right after deployment of microservices through Jenkinsx

I am trying to deploy micro services in OKD through Jenkinsx and the deployment is successful every time.
But the Pods are going into "ImagePullBackOff" error right after deployment and comes into Running state after deleting the pods.
ImagePullBackOff Error:
Events:
The images are being pulled from the OKD's internal registry and the image is of size "1.25 GB". And the images are available in the Internal Registry when the pod is trying to pull it.
I came across "image-pull-progress-deadline" field to be updated in the "/etc/origin/node/node-config.yaml" in all the nodes. Updated the same in all the nodes but still facing the same "ImagePullBackOff" error.
I am trying to restart the kubelet service but that fails with kubelet.service not found error,
[master ~]$ sudo systemctl status kubelet
Unit kubelet.service could not be found.
Please let me know if restart of kubelet service is necessary and any suggestions to resolve the "ImagePullBackOff" issue.

Installing jFrog Artifactory via Helm, install errors

Attempted to install: jFrog Artifactory HA
Platform: GCE kubernetes cluster on CoreOS; 1 master, 2 workers
Installation method: Helm chart
Helm steps taken:
Add jFrog repo to local helm: helm repo add jfrog https://charts.jfrog.io
Install license as kubernetes secret in cluster: kubectl create secret generic artifactory-cluster-license --from-file=./art.lic
Install via helm:
helm install --name artifactory-ha jfrog/artifactory-ha
--set artifactory.masterKey=,artifactory.license.secret=artifactory-cluster-license,artifactory.license.dataKey=art.lic
Result:
Helm installation went without complaint. Checked services, seemed to be fine, LoadBalancer was pending and came online.
Checked PVs and PVCs, seemed to be fine and bound:
NAME STATUS
artifactory-ha-postgresql Bound
volume-artifactory-ha-artifactory-ha-member-0 Bound
volume-artifactory-ha-artifactory-ha-primary-0 Bound
Checked the pods and only postgres was ready:
NAME READY STATUS RESTARTS AGE
artifactory-ha-artifactory-ha-member-0 0/1 Running 0 3m
artifactory-ha-artifactory-ha-primary-0 0/1 Running 0 3m
artifactory-ha-nginx-697844f76-jt24s 0/1 Init:0/1 0 3m
artifactory-ha-postgresql-676999df46-bchq9 1/1 Running 0 3m
Waited for a few minutes, no change. Waited 2 hours, still at the same state as above. Checked logs of the artifactory-ha-artifactory-ha-primary-0 pod (it's quite long, but I can post if that will help anybody determine the problem), but noted this error:
SEVERE: One or more listeners failed to start. Full details will be found in the appropriate container log file. I couldn't think of where else to check for logs. Services were running, other pods seemed to be waiting on this primary pod.
The log continues with SEVERE: Context [/artifactory] startup failed due to previous errors and then starts spewing Java stack dumps after the "ACCESS" ASCII art, messages like WARNING: The web application [artifactory] appears to have started a thread named [Thread-5] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
I ended up leaving the cluster up over night, and now, about 12 hours later, I'm very surprised to see that the "primary" pod did actually come online:
NAME READY STATUS RESTARTS AGE
artifactory-ha-artifactory-ha-member-0 1/1 Terminating 0 19m
artifactory-ha-artifactory-ha-member-1 0/1 Terminating 0 17m
artifactory-ha-artifactory-ha-primary-0 1/1 Running 0 3h
artifactory-ha-nginx-697844f76-vsmzq 0/1 Running 38 3h
artifactory-ha-postgresql-676999df46-gzbpm 1/1 Running 0 3h
Though, the nginx pod did not. It eventually succeeded at its init container command (until nc -z -w 2 artifactory-ha 8081 && echo artifactory ok; do), but cannot pass its readiness probe: Warning Unhealthy 1m (x428 over 3h) kubelet, spczufvthh-worker-1 Readiness probe failed: Get http://10.2.2.45:80/artifactory/webapp/#/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Perhaps I missed some required step in the setup or helm installation switches? This is my first attempt at setting up jFrog Artifactory HA, and I noticed most of the instructions seem to be for baremetal clusters, so perhaps I confused something.
Any help is appreciated!
Turned out we messed up a couple of things, and had a few misunderstandings about how the install process works. Maybe this will be some help to people in the future.
1) The masterKey value needs to be at least 16 characters long. We had initially tried too short of a key. We tried installing again and writing this new masterKey to a secret instead, but...
2) The values in the secrets seem to get read once at initial install attempt, then they are written to the persistent volume and updating the secret after that seems to have no effect.
3) We also didn't understand the license key format and constraints. You need a license for every node that will run Artifactory, and all the licenses go into a single file, with each license separated by two return/new lines.
The error logs were pretty unhelpful to us in these errors. We eventually wiped out the install, including the PVs, and finally everything went fine.

Setting up kubernetes on linux raspberry

I am trying to setup kubernetes cluster on raspberry pi. I have two pi, one of them will work as master and other one will work as worker.
I am not using Hypriot Os instead using Raspbian stretch image. I followed these tutorial link1 link2. Link1 recommend to use Hypriot Os but I continued with Raspbian Stretch. This is what I have done till now on both master and worker:
Installed docker
Disabled swapfile
Added cgroup in /boot/cmdline.txt
Installed kubernetes in both the pi.
Initiated the master and worker then joined the master node.
Till now everything seems to working ok. But while running the command kubectl get nodes, I get:
NAME STATUS ROLES AGE VERSION
raspberrypi NotReady master 1h v1.8.4
worker NotReady <none> 40m v1.8.4
My first question is why it shows worker as NotReady even my worker pi is on and running.
Next question is how can I access the cluster from the its dashboard. How to install dashboard.?
Issues have been resolved in the comment section.
for debugging k8s Nodes in the cluster, we use the following command to get precise information
get the list of nodes
kubectl get nodes
get the comprehensive info
kubectl describe nodes NODE_NAME
by the above system information, we can verify and validate the status of kubelet docker and kube-proxy
It is showing you status as NotReady because you have not installed any networking driver for it. Weave is suitable for networking in case of Raspberry pi. You can install it by using below commands:
kubectl apply -f https://git.io/weave-kube-1.6
Have a look at these tutorials:
https://www.youtube.com/watch?v=zc0sbXwONM4&list=PLWw98q-Xe7iHSVH-AE9hDGBFtC9rFxcME

Kubernetes Master Server is failing to become up and running

Installed kubeadm v1.6.0-alpha, kubectl v1.5.3, kubelet v1.5.3
Executed command $kubeadm init, to bring the Kubernetes Master up.
Issue observed: Stuck with the below log message
Created API client, waiting for the control plane to become ready
How to make the Kubernetes master server up and running or how to debug the issue?
Could you try using kubelet and kubectl 1.6 to see if it is a version mismatch?

Pod Containers Keeps on Restarting

Container getting killed at node after pod creation
Issue was raised at github and asked me to move to SO
https://github.com/kubernetes/kubernetes/issues/24241
However i am briefing my issue here.After creating pod it doesnt run since i have to mention the container name in the kubelet args under --pod-infra-container-image as mentioned below.
I have solved the issue of Pods Status Container Creating by adding the container name in "--pod-infra-container-image= then pod creation was successful.
However I want to resolve this issue some other way instead of adding containers name in kubelet args. Kindly let me know how do I get this issue fixed.
Also after the pod creation is done. The containers keep on restarting. However if I check the logs via kubectl logs output shows the container expected output.
But the container restarts often. For restarting of pod what i did i have made the restartPolicy: never in spec file of pod and then it didnt restarted however container doesnt run. Kindly help me.
Your description is very confusing, can you please reply to this answer with:
1. What error you get when you do docker pull gcr.io/google_containers/pause:2.0
2. What cmd you're running in your container
For 2 you need a long running command, like while true; do echo SUCCESS; done, otherwise it'll just exit and get restarted by the kubelet with a RestartPolicy of Always.