I have a 3-node ubuntu microk8s installation and it seems to be working ok. All 3 nodes are management nodes.
On only one of the nodes, I get an error message and associated delay whenever I use a kubectl command. It looks like this:
$ time kubectl get pods
I0324 03:49:44.270996 514696 request.go:665] Waited for 1.156689289s due to client-side throttling, not priority and fairness, request: GET:https://127.0.0.1:16443/apis/authentication.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE
sbnweb-5f9d9b977f-lw7t9 1/1 Running 1 (10h ago) 3d3h
shell-6cfccdbd47-zd2tn 1/1 Running 0 6h39m
real 0m6.558s
user 0m0.414s
sys 0m0.170s
The error message always shows a different URL each time. I tried looking up the error code (I0324) and haven't found anything useful.
The other two nodes don't show this behavior. No error message and completes the request in less than a second.
I'm new to k8s so I am not sure how to diagnose this kind of problem. Any hints on what to look for would be greatly appreciated.
Here's a good write-up about the issue. For some cases rm -rf ~/.kube/cache will remove the issue.
I had a same error with kubectl on Windows. Deleting "http-cache" folder in ".kube" fixed it problem. c:\Users****.kube\http-cache\
Related
My Pods getting SIGTERM automatically for unknown reason. Unable to find root cause of why SIGTERM sent from kubelet to my pods is the concern to fix issue.
When I ran kubectl describe podname -n namespace, under events section Only killing event is present. I didn't see any unhealthy status before kill event.
Is there any way to debug further with events of pods or any specific log files where we can find trace of reason for sending SIGTERM?
I tried to do kubectl describe on events(killing)but it seems no such command to drill down events further.
Any other approach to debug this issue is appreciated.Thanks in advance!
kubectl desribe pods snippet
Please can you share the yaml of your deployment so we can try to replicate your problem.
Based on your attached screenshot, it looks like your readiness probe failed to complete repeatedly (it didn't run and fail, it failed to complete entirely), and therefore the cluster killed it.
Without knowing what your docker image is doing makes it hard to debug from here.
As a first point of debugging, you can try doing kubectl logs -f -n {namespace} {pod-name} to see what the pod is doing and seeing if it's erroring there.
The error Client.Timeout exceeded while waiting for headers implies your container is proxying something? So perhaps what you're trying to proxy upstream isn't responding.
I am currently working on a monitoring service that will monitor Kubernetes' deployments and their pods. I want to notify users when a deployment is not running the expected amount of replicas and also when pods' containers restart unexpectedly. This may not be the right things to monitor and I would greatly appreciate some feedback on what I should be monitoring.
Anyways, the main question is the differences between all of the Statuses of pods. And when I say Statuses I mean the Status column when running kubectl get pods. The statuses in question are:
- ContainerCreating
- ImagePullBackOff
- Pending
- CrashLoopBackOff
- Error
- Running
What causes pod/containers to go into these states?
For the first four Statuses, are these states recoverable without user interaction?
What is the threshold for a CrashLoopBackOff?
Is Running the only status that has a Ready Condition of True?
Any feedback would be greatly appreciated!
Also, would it be bad practice to use kubectl in an automated script for monitoring purposes? For example, every minute log the results of kubectl get pods to Elasticsearch?
You can see the pod lifecycle details in k8s documentation.
The recommended way of monitoring kubernetes cluster and applications are with prometheus
I will try to tell what I see hidden behind these terms
ContainerCreating
Showing when we wait to image be downloaded and the
container will be created by a docker or another system.
ImagePullBackOff
Showing when we have problem to download the image from a registry. Wrong credentials to log in to the docker hub for example.
Pending
The container starts (if start take time) or started but redinessProbe failed.
CrashLoopBackOff
This status showing when container restarts occur too much often. For example, we have process that tries to read not exists file and crash. Then the container will be recreated by Kube and repeat.
Error
This is pretty clear. We have some errors to run the container.
Running
All is good container running and livenessProbe is OK.
Attempted to install: jFrog Artifactory HA
Platform: GCE kubernetes cluster on CoreOS; 1 master, 2 workers
Installation method: Helm chart
Helm steps taken:
Add jFrog repo to local helm: helm repo add jfrog https://charts.jfrog.io
Install license as kubernetes secret in cluster: kubectl create secret generic artifactory-cluster-license --from-file=./art.lic
Install via helm:
helm install --name artifactory-ha jfrog/artifactory-ha
--set artifactory.masterKey=,artifactory.license.secret=artifactory-cluster-license,artifactory.license.dataKey=art.lic
Result:
Helm installation went without complaint. Checked services, seemed to be fine, LoadBalancer was pending and came online.
Checked PVs and PVCs, seemed to be fine and bound:
NAME STATUS
artifactory-ha-postgresql Bound
volume-artifactory-ha-artifactory-ha-member-0 Bound
volume-artifactory-ha-artifactory-ha-primary-0 Bound
Checked the pods and only postgres was ready:
NAME READY STATUS RESTARTS AGE
artifactory-ha-artifactory-ha-member-0 0/1 Running 0 3m
artifactory-ha-artifactory-ha-primary-0 0/1 Running 0 3m
artifactory-ha-nginx-697844f76-jt24s 0/1 Init:0/1 0 3m
artifactory-ha-postgresql-676999df46-bchq9 1/1 Running 0 3m
Waited for a few minutes, no change. Waited 2 hours, still at the same state as above. Checked logs of the artifactory-ha-artifactory-ha-primary-0 pod (it's quite long, but I can post if that will help anybody determine the problem), but noted this error:
SEVERE: One or more listeners failed to start. Full details will be found in the appropriate container log file. I couldn't think of where else to check for logs. Services were running, other pods seemed to be waiting on this primary pod.
The log continues with SEVERE: Context [/artifactory] startup failed due to previous errors and then starts spewing Java stack dumps after the "ACCESS" ASCII art, messages like WARNING: The web application [artifactory] appears to have started a thread named [Thread-5] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
I ended up leaving the cluster up over night, and now, about 12 hours later, I'm very surprised to see that the "primary" pod did actually come online:
NAME READY STATUS RESTARTS AGE
artifactory-ha-artifactory-ha-member-0 1/1 Terminating 0 19m
artifactory-ha-artifactory-ha-member-1 0/1 Terminating 0 17m
artifactory-ha-artifactory-ha-primary-0 1/1 Running 0 3h
artifactory-ha-nginx-697844f76-vsmzq 0/1 Running 38 3h
artifactory-ha-postgresql-676999df46-gzbpm 1/1 Running 0 3h
Though, the nginx pod did not. It eventually succeeded at its init container command (until nc -z -w 2 artifactory-ha 8081 && echo artifactory ok; do), but cannot pass its readiness probe: Warning Unhealthy 1m (x428 over 3h) kubelet, spczufvthh-worker-1 Readiness probe failed: Get http://10.2.2.45:80/artifactory/webapp/#/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Perhaps I missed some required step in the setup or helm installation switches? This is my first attempt at setting up jFrog Artifactory HA, and I noticed most of the instructions seem to be for baremetal clusters, so perhaps I confused something.
Any help is appreciated!
Turned out we messed up a couple of things, and had a few misunderstandings about how the install process works. Maybe this will be some help to people in the future.
1) The masterKey value needs to be at least 16 characters long. We had initially tried too short of a key. We tried installing again and writing this new masterKey to a secret instead, but...
2) The values in the secrets seem to get read once at initial install attempt, then they are written to the persistent volume and updating the secret after that seems to have no effect.
3) We also didn't understand the license key format and constraints. You need a license for every node that will run Artifactory, and all the licenses go into a single file, with each license separated by two return/new lines.
The error logs were pretty unhelpful to us in these errors. We eventually wiped out the install, including the PVs, and finally everything went fine.
I am new to Kubernetes and started working with it from past one month.
When creating the setup of cluster, sometimes I see that Heapster will be stuck in Container Creating or Pending status. After this happens the only way have found here is to re-install everything from the scratch which has solved our problem. Later if I run the Heapster it would run without any problem. But I think this is not the optimal solution every time. So please help out in solving the same issue when it occurs again.
Heapster image is pulled from the github for our use. Right now the cluster is running fine, So could not send the screenshot of the heapster failing with it's status by staying in Container creating or Pending status.
Suggest any alternative for the problem to be solved if it occurs again.
Thanks in advance for your time.
A pod stuck in pending state can mean more than one thing. Next time it happens you should do 'kubectl get pods' and then 'kubectl describe pod '. However, since it works sometimes the most likely cause is that the cluster doesn't have enough resources on any of its nodes to schedule the pod. If the cluster is low on remaining resources you should get an indication of this by 'kubectl top nodes' and by 'kubectl describe nodes'. (Or with gke, if you are on google cloud, you often get a low resource warning in the web UI console.)
(Or if in Azure then be wary of https://github.com/Azure/ACS/issues/29 )
Right now, I deployed some pods on my kubernetes cluster. But sometime, my image may has some bugs which make the pod cannot start correctly.
For example:
nats-1 0/1 CrashLoopBackOff 121 10h
I also cannot see any error in the kubectl log.
So is there any way to access this pod? Or is there any tools or tech can allow to to enter the container?
Thanks a lot all! :)
You can kubectl describe to get the events, it sometimes might show some errors there. Otherwise you can probably also make the deployment/pod run a command like sleep 3600 to keep it open for you to exec into it to investigate further.
Edited after clarification:
You could go into the worker (kubectl get pod <pod-name> -o wide to get which one) and access the node syslogs or pods' logs. That should show you a more detailed information of what happened.
But #ho-man approach is very valid and less cumbersome.