Kubernetes events for container activity on node - kubernetes

I have Kubernetes cluster with Kubernetes master and nodes. I am interested in listening to the event on Kubernetes master when any node creates/stops container.
Something similar docker events which keeps on listening for the events and pops the output on the screen on some activity.
Can someone please let me know how I can do this for Kubernetes?

You might want to dive deep into the API docs and check the actual documentation.
In order to see all events, you can watch one of the objects of interests and maybe filter down the list so that you don't see everything. How that's done is described in the API operations guide.
A first super simple try would be: http://<kubernetes-master>:8080/api/v1/pods?watch=true to see the stream of events for the v1.Pod objects.
Another way to discover the API is to use kubectl in verbose mode. So if you found a kubectl command which get's you what you need you could add -v=6 to it to see which API url is called to get the data. In your program you can then use the same URL to get your data without kubectl in the middle.
Using the example from Janos this would be: kubectl get ev -w -v=6 which results in sth like:
...
I0322 17:03:55.738391 18068 round_trippers.go:318] GET http://127.0.0.1:8080/api/v1/watch/namespaces/default/events?resourceVersion=18474970 200 OK in 0 milliseconds
...
Hope any of this helps.

Related

Where do I find the function for "kubectl describe <CRD>"?

I am studying "kubectl describe" sourcecodes at https://github.com/kubernetes/kubectl/blob/master/pkg/describe/describe.go
However, I still could not figure out how "kubectl decsribe [CRD]" works (as in which function/functions are called).
I am a Go newbie, so would like to get some pointers please. Thanks.
I have read describePod function and understand how it works more or less, but still could not figure out how "kubectl describe [CRD]" works.
The "kubectl describe " function can be found in the command-line interface (CLI) of Kubernetes, specifically in the "kubectl" tool. "kubectl" is used to manage and interact with a Kubernetes cluster and its resources.
enter image description here
Kubectl describe command helps to view the entire information about the kubernetes resources like Pods,deployments,services,nodes,jobs etc.
By using CRD(Custom Resource Definition) you can do CRUD operations like create, update, get and delete commands to access the resources. To use CRD we need to use the API groups.
Example:
Suppose you specify an API group as example.crd.com, which means you can issue the get, list, create, update, and delete commands to access the custom resources under the API group example.crd.com.
You can use kubectl describe crd <crd_name> to get a description of the CRD.
For more information refer this official doc
Try this similar SO’s SO1 and SO2 for more information

GCP Alerting Policy for failed GKE CronJob

What would be the best way to set up a GCP monitoring alert policy for a Kubernetes CronJob failing? I haven't been able to find any good examples out there.
Right now, I have an OK solution based on monitoring logs in the Pod with ERROR severity. I've found this to be quite flaky, however. Sometimes a job will fail for some ephemeral reason outside my control (e.g., an external server returning a temporary 500) and on the next retry, the job runs successfully.
What I really need is an alert that is only triggered when a CronJob is in a persistent failed state. That is, Kubernetes has tried rerunning the whole thing, multiple times, and it's still failing. Ideally, it could also handle situations where the Pod wasn't able to come up either (e.g., downloading the image failed).
Any ideas here?
Thanks.
First of all, confirm the GKE’s version that you are running. For that, the following commands are going to help you to identify the GKE’s
default version and the available versions too:
Default version.
gcloud container get-server-config --flatten="channels" --filter="channels.channel=RAPID" \
--format="yaml(channels.channel,channels.defaultVersion)"
Available versions.
gcloud container get-server-config --flatten="channels" --filter="channels.channel=RAPID" \
--format="yaml(channels.channel,channels.validVersions)"
Now that you know your GKE’s version and based on what you want is an alert that is only triggered when a CronJob is in a persistent failed state, GKE Workload Metrics was the GCP’s solution that used to provide a fully managed and highly configurable solution for sending to Cloud Monitoring all Prometheus-compatible metrics emitted by GKE workloads (such as a CronJob or a Deployment for an application). But, as it is right now deprecated in G​K​E 1.24 and was replaced with Google Cloud Managed Service for Prometheus, then this last is the best option you’ve got inside of GCP, as it lets you monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.
Plus, you have 2 options from the outside of GCP: Prometheus as well and Ranch’s Prometheus Push Gateway.
Finally and just FYI, it can be done manually by querying for the job and then checking it's start time, and compare that to the current time, this way, with bash:
START_TIME=$(kubectl -n=your-namespace get job your-job-name -o json | jq '.status.startTime')
echo $START_TIME
Or, you are able to get the job’s current status as a JSON blob, as follows:
kubectl -n=your-namespace get job your-job-name -o json | jq '.status'
You can see the following thread for more reference too.
Taking the “Failed” state as the medullary point of your requirement, setting up a bash script with kubectl to send an email if you see a job that is in “Failed” state can be useful. Here I will share some examples with you:
while true; do if `kubectl get jobs myjob -o jsonpath='{.status.conditions[?(#.type=="Failed")].status}' | grep True`; then mail email#address -s jobfailed; else sleep 1 ; fi; done
For newer K8s:
while true; do kubectl wait --for=condition=failed job/myjob; mail#address -s jobfailed; done

How to monitor pod preemption event

I have a bunch of Rancher clusters I take care of and on some of them developers use PriorityClasses to ensure that some of the more important workloads get scheduled. The 3 PriorityClasses are in 3 digits range so they will not interfere with the default ones. However, at present none of the PriorityClasses is set as default and neither is the preemptionPolicy set so it defaults to PreemptLowerPriority.
None of the rancher, longhorn, prometheus, grafana, etc., workloads have priorityClassName set.
Long story short, I believe this causes havoc on the cluster when resources are in short supply.
Before I take my opinion to the developers I would like to collect some data to back up my story.
The question: How do I detect if the pod was Terminated due to Preemption?
I tried to google the subject but couldn't find anything. I was hoping kube state metrics would have something but I didn't find anything.
Any help would be greatly appreciated.
You can try to look for convincing data like the pod termination reason with help of kubectl.
You can see the last restart logs of a container using the following command:
kubectl logs podname -c containername --previous
You can also use the following command to check the lifecycle events sent by the kubelet to the apiserver about the pod.
kubectl describe pod podname
Finally, You can also write a final message to /dev/termination-log, and this will show up as described in the docs.
To use kubectl commands with rancher kindly refer to this documentation page.

How to get status history/lineage for Kubernetes pods

I was wondering if there is a kubectl command to quickly get the history of all STATUS for a given pod?
for example: Lets say a pod - my-test-pod went from ContainerCreating to Running to OomKill to Terminating:
I was wondering if there is a command that experts use to get this lineage. Appreciate a nudge..
Using kubectl get events you can only see events of last 1 hour. If you want to persist events for a longer duration you can sue eventrouter.The event router serves as an active watcher of event resource in the kubernetes system, which takes those events and pushes them to a user specified sink. This is useful for a number of different purposes, but most notably long term behavioral analysis of your workloads running on your kubernetes cluster.
kubectl get events or kubectl describe pod which shows the events for the pod at the bottom. However events are only kept for a little while, so it's not a permanent history. For that you would need some webhooks or a tool like Prometheus.

Fetching Stackdriver Monitoring TimeSeries data for a pod running on a k8s cluster on GKE using the REST API

My objective is to fetch the time series of a metric for a pod running on a kubernetes cluster on GKE using the Stackdriver TimeSeries REST API.
I have ensured that Stackdriver monitoring and logging are enabled on the kubernetes cluster.
Currently, I am able to fetch the time series of all the resources available in a cluster using the following filter:
metric.type="container.googleapis.com/container/cpu/usage_time" AND resource.labels.cluster_name="<MY_CLUSTER_NAME>"
In order to fetch the time series of a given pod id, I am using the following filter:
metric.type="container.googleapis.com/container/cpu/usage_time" AND resource.labels.cluster_name="<MY_CLUSTER_NAME>" AND resource.labels.pod_id="<POD_ID>"
This filter returns an HTTP 200 OK with an empty response body. I have found the pod ID from the metadata.uid field received in the response of the following kubectl command:
kubectl get deploy -n default <SERVICE_NAME> -o yaml
However, when I use the Pod ID of a background container spawned by GKE/Stackdriver, I do get the time series values.
Since I am able to see Stackdriver metrics of my pod on the GKE UI, I believe I should also get the metric values using the REST API.
My doubts/questions are:
Am I fetching the Pod ID of my pod correctly using kubectl?
Could there be some issue with my cluster setup/service deployment due to which I'm unable to fetch the metrics?
Is there some other way in which I can get the time series of my pod using the REST APIs?
I wouldn't rely on kubectl get deploy for pod ids. I would get them with something like kubectl -n default get pods | grep <prefix-for-your-pod> | awk '{print $1}'
I don't think so, but the best way to find out is opening a support ticket with GCP if you have any doubts.
Not that I'm aware of, Stackdriver is the monitoring solution in GCP. Again, you can check with GCP support. There are other tools that you can use to get metrics from Kubernetes like Prometheus. There are multiple guides on the web on how to set it up with Grafana on k8s. This is one for example.
Hope it helps!
Am I fetching the Pod ID of my pod correctly using kubectl?
You could use JSONpath as output with kubectl, in this case iterating over the Pods and fetching the metadata.name and metadata.uid fields:
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\n"}{end}'
which will output something like this:
nginx-65899c769f-2j775 d4fr5t6-bc2f-11e8-81e8-42010a84011f
nginx2-77b5c9d48c-7qlps 4f5gh6r-bc37-11e8-81e8-42010a84011f
Could there be some issue with my cluster setup/service deployment due to which I'm unable to fetch the metrics?
As #Rico mentioned in his answer, contacting the GCP support could be a way forward if you don't get further with the troubleshooting, see below.
Is there some other way in which I can get the time series of my pod using the REST APIs?
You could use the APIs Explorer or the Metrics Explorer from within the Stackdriver portal. There's some good troubleshooting tips here with a link to the APIs Explorer. In the Stackdriver Metrics Explorer it's fairly easy to reassemble the filter you've used using dropdown lists to choose e.g. a particular pod_id.
Taken from the Troubleshooting the Monitoring guide (linked above) regarding an empty HTTP 200 response on filtered queries:
If your API call returns status code 200 and an empty response, there
are several possibilities:
If your call uses a filter, then the filter might not have matched anything. The filter match is case-sensitive. To resolve filter
problems, start by specifying only one filter component, such as
metric.type, and see if you get results. Add the other filter
components one-by-one.
If you are working with a custom metric, you might not have specified the project where your custom metric is defined.*
I found this link when reading through the documentation of the Monitoring API. That link will get you to the APIs Explorer with some pre-filled fields, change these accordingly and add your own filter.
I have not tested more using the REST API at the moment but hopefully this could get you forward.