I am working on two servers, each with a number of pods. The first server is Validation Env and I can use kubernetes commands, but the second server is on Prod Env I do not have full rights. It is out of the question to get full rights on the last one.
So, I am doing a platform stability statistic and I need info about the last reset of pods. I can see the "Age" but I cannot use a screenshot in my statistic, so I need a command that outputs every pods age or the last reset.
P.S. Every night at 00:00 the pods are saved and archived in a separate folder.
Get pods already gives you that info:
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-7cdbd8cdc9-8pnzq 1/1 Running 0 36s
$ kubectl delete po nginx-7cdbd8cdc9-8pnzq
pod "nginx-7cdbd8cdc9-8pnzq" deleted
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-7cdbd8cdc9-67l8l 1/1 Running 0 4s
I found a solution:
command:
zgrep "All subsystems started successfully" 201911??/*ota*
response:
23:23:37,429 [INFO ] main c.o.c.a.StartUp - All subsystems started successfully
P.S. "ota" is my pod's name.
Related
We have a argo-rollout for one of the service. I used the cmd to update the image.
kubectl-argo-rollouts -n ddash5 set image detector detector=starry-academy-177207/detector:deepak-detector-8
I was expecting this to update the pod, but it created a new one.
NAME READY STATUS RESTARTS AGE
detector-5d96bc8456-h2x7p 1/1 Running 0 35m
detector-68f89d8b45-j465j 0/1 Running 0 35m
Even if I delete detector-5d96bc8456-h2x7p, pod gets recreated with the older image.
and detector-68f89d8b45-j465j stays in 0/1 state.
I am new to kube, Can someone give me insights to this?
Thanks!!!
Deepak
You are using argo rollout, where rolling updates allow deployment updates pods instances with new ones. The new Pods will be scheduled on Nodes with available resources.That is the reason new pods are getting created by replacing existing pods.
Instead you can use kubectl set image command which is used to update images of existing deployment, it will update images without recreating the deployment.Use the following command.
kubectl set image deployment/<deployment-name> <container-name>=<image>:<tag>
In your case:
kubectl set image deployment/detector detector=starry-academy-177207/detector:deepak-detector-8
This will update existing deployment, try it and let me know if this works.Found ArgoCD Image updater you can check it.
I have a 3-node ubuntu microk8s installation and it seems to be working ok. All 3 nodes are management nodes.
On only one of the nodes, I get an error message and associated delay whenever I use a kubectl command. It looks like this:
$ time kubectl get pods
I0324 03:49:44.270996 514696 request.go:665] Waited for 1.156689289s due to client-side throttling, not priority and fairness, request: GET:https://127.0.0.1:16443/apis/authentication.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE
sbnweb-5f9d9b977f-lw7t9 1/1 Running 1 (10h ago) 3d3h
shell-6cfccdbd47-zd2tn 1/1 Running 0 6h39m
real 0m6.558s
user 0m0.414s
sys 0m0.170s
The error message always shows a different URL each time. I tried looking up the error code (I0324) and haven't found anything useful.
The other two nodes don't show this behavior. No error message and completes the request in less than a second.
I'm new to k8s so I am not sure how to diagnose this kind of problem. Any hints on what to look for would be greatly appreciated.
Here's a good write-up about the issue. For some cases rm -rf ~/.kube/cache will remove the issue.
I had a same error with kubectl on Windows. Deleting "http-cache" folder in ".kube" fixed it problem. c:\Users****.kube\http-cache\
When you run "kubectl get pods -A -o wide" you get a list of pods and a STATUS column.
Where can I get a list of the possible status options?
What I trying to do is generate a list of statuses and how many pods are in each status. If I had a list of the possible status states I could do what I need.
Thanks.
if you want also result on container basics, you try this command
kubectl get pods -A -o wide --no-headers | cut -b 85-108 | sort | uniq -c
if the output looks like
2 0/1 CrashLoopBackOff
1 0/3 Pending
260 1/1 Running
4 2/2 Running
like comment in Complete list of pod statuses :
$ kubectl get pod -A --no-headers |awk '{arr[$4]++}END{for (a in arr) print a, arr[a]}'
Evicted 1
Running 121
CrashLoopBackOff 4
Completed 5
Pending 1
This command will shows how many pod are currently in what state.
But how to get the possible values of all the states?
In my view, there is no api or command to get it.
This status: "The aggregate status of the containers in this pod." source code can be find in https://github.com/kubernetes/kubernetes/blob/master/pkg/printers/internalversion/printers.go#L741 shows status based on pod.Status.Phase and will be changed.
A phase of a Pod is a simple, high-level summary of where the Pod is in its Lifecycle.
The phase is not intended to be a comprehensive rollup of observations of Container or Pod state, nor is it intended to be a comprehensive state machine.
Here are the possible values for phase:
Pending The Pod has been accepted by the Kubernetes system, but one or more of the Container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while.
Running The Pod has been bound to a node, and all the Containers have been created. At least one Container is still running, or is working on starting or restarting.
Succeeded All Containers in the Pod have terminated in success, and will not be restarted.
Failed All Containers in the Pod have terminated, and at least one Container has terminated in failure. That is, the Container either exited with non-zero status or was terminated by the system.
Unknown For some reason the state of the Pod could not be obtained, typically due to an error in communicating with the host of the Pod.
If you are interested in detailed arrays with Pod conditions, I suggest looking at Pod Lifecycle from Kubernetes documentation and inspect source code for remaining information.
I have a requirement I want to know every part of the time spent of a Pod.
how much time to pull a docker image? Maybe a Pod has multiple initContainers and containers. I want to know every part of them.
Maybe I can analysis the Events using
'kubectl describe pod-name...'
how much time a Pod get ready? From being created and get readiness ready.
For a bare Pod, I can know the startTime of the Pod and which time it is finished. Then I can calculate the duration.
But for pods that created by Deployment,StatefulSet,DaemonSet, I cannot find any time flag that indicating the first time that the Pod becomes readiness ready.
I want to know how much time spent to get the Pod ready. Not the age of the Pod.
The easiest method would be to subscribe to api-server to notify you if some changes occur in your cluster.
For example, I issued:
$ kubectl get pods --output-watch-events --watch
and then created a new pod. Here is the output:
EVENT NAME READY STATUS RESTARTS AGE
ADDED example-pod 0/1 Pending 0 0s
MODIFIED example-pod 0/1 ContainerCreating 0 0s
MODIFIED example-pod 0/1 Running 0 19s
MODIFIED example-pod 1/1 Running 0 23s
and here is a little explanation:
As you can see first event is ADDED and it is in Pending state which means that pod object just got created.
Second event is MODIFIED with ContainerCreating status, age 0 which means it took less than 1s time to assing/schedule the pod to a node. Now kubelet starts downloading continer image.
Third event has Running status meaning continer is started running. Looking at age column you can see it took 19s since previous event so it took around 19s to download the image and start the container. If you take a look at READY column you can see 0/1 value, so container is running but it is not yet in ready state.
Fourth event has READY column set to 1/1 so readiness probe has passed successfully. If you now look at the age column, you can see it took around 4s (32-19) to check readiness probe and change pod status.
If this information in not enough you can use --output= parameter to receive full pod specification on every change.
You can also play with kubectl get events to receive some more events. And of course, by adding --watch flag you can watch events in real time.
If you want higher level of felxibility, use dedicated kubernetes clinet libraries instead of kuebctl to receive this information and to process it.
Attempted to install: jFrog Artifactory HA
Platform: GCE kubernetes cluster on CoreOS; 1 master, 2 workers
Installation method: Helm chart
Helm steps taken:
Add jFrog repo to local helm: helm repo add jfrog https://charts.jfrog.io
Install license as kubernetes secret in cluster: kubectl create secret generic artifactory-cluster-license --from-file=./art.lic
Install via helm:
helm install --name artifactory-ha jfrog/artifactory-ha
--set artifactory.masterKey=,artifactory.license.secret=artifactory-cluster-license,artifactory.license.dataKey=art.lic
Result:
Helm installation went without complaint. Checked services, seemed to be fine, LoadBalancer was pending and came online.
Checked PVs and PVCs, seemed to be fine and bound:
NAME STATUS
artifactory-ha-postgresql Bound
volume-artifactory-ha-artifactory-ha-member-0 Bound
volume-artifactory-ha-artifactory-ha-primary-0 Bound
Checked the pods and only postgres was ready:
NAME READY STATUS RESTARTS AGE
artifactory-ha-artifactory-ha-member-0 0/1 Running 0 3m
artifactory-ha-artifactory-ha-primary-0 0/1 Running 0 3m
artifactory-ha-nginx-697844f76-jt24s 0/1 Init:0/1 0 3m
artifactory-ha-postgresql-676999df46-bchq9 1/1 Running 0 3m
Waited for a few minutes, no change. Waited 2 hours, still at the same state as above. Checked logs of the artifactory-ha-artifactory-ha-primary-0 pod (it's quite long, but I can post if that will help anybody determine the problem), but noted this error:
SEVERE: One or more listeners failed to start. Full details will be found in the appropriate container log file. I couldn't think of where else to check for logs. Services were running, other pods seemed to be waiting on this primary pod.
The log continues with SEVERE: Context [/artifactory] startup failed due to previous errors and then starts spewing Java stack dumps after the "ACCESS" ASCII art, messages like WARNING: The web application [artifactory] appears to have started a thread named [Thread-5] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
I ended up leaving the cluster up over night, and now, about 12 hours later, I'm very surprised to see that the "primary" pod did actually come online:
NAME READY STATUS RESTARTS AGE
artifactory-ha-artifactory-ha-member-0 1/1 Terminating 0 19m
artifactory-ha-artifactory-ha-member-1 0/1 Terminating 0 17m
artifactory-ha-artifactory-ha-primary-0 1/1 Running 0 3h
artifactory-ha-nginx-697844f76-vsmzq 0/1 Running 38 3h
artifactory-ha-postgresql-676999df46-gzbpm 1/1 Running 0 3h
Though, the nginx pod did not. It eventually succeeded at its init container command (until nc -z -w 2 artifactory-ha 8081 && echo artifactory ok; do), but cannot pass its readiness probe: Warning Unhealthy 1m (x428 over 3h) kubelet, spczufvthh-worker-1 Readiness probe failed: Get http://10.2.2.45:80/artifactory/webapp/#/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Perhaps I missed some required step in the setup or helm installation switches? This is my first attempt at setting up jFrog Artifactory HA, and I noticed most of the instructions seem to be for baremetal clusters, so perhaps I confused something.
Any help is appreciated!
Turned out we messed up a couple of things, and had a few misunderstandings about how the install process works. Maybe this will be some help to people in the future.
1) The masterKey value needs to be at least 16 characters long. We had initially tried too short of a key. We tried installing again and writing this new masterKey to a secret instead, but...
2) The values in the secrets seem to get read once at initial install attempt, then they are written to the persistent volume and updating the secret after that seems to have no effect.
3) We also didn't understand the license key format and constraints. You need a license for every node that will run Artifactory, and all the licenses go into a single file, with each license separated by two return/new lines.
The error logs were pretty unhelpful to us in these errors. We eventually wiped out the install, including the PVs, and finally everything went fine.