Timeline of kubernetes events - kubernetes

I would like to be able to see all of the various things that happened to a kube cluster on a timeline, including when nodes were found to be dead, when new nodes were added, when pods crashed and when they were restarted.
So far the best that we have found is kubectl get event but that seems to have a few limitations:
it doesn't go back in time that far (I'm not sure how far it goes back. A day?)
it combines similar events and orders the resulting list by the time of the latest event in each group. This makes it impossible to know what happened during some time range since events in that range may have been combined with later events outside the range.
One idea that I have is to write a pod that will use the API to watch the stream of events and log them to a file. This would let us control retention and it seems that events that occur while we are watching will not be combined, solving the second problem as well.
What are other people doing about this?

My understanding is that Kubernetes itself dedups events, documented here:
https://github.com/kubernetes/kubernetes/blob/master/docs/design/event_compression.md
Once that happens, there is no way to get the individual events back.
See https://github.com/kubernetes/kubernetes/issues/36304 for complaints how that loses info. https://github.com/kubernetes/kubernetes/pull/46034 at least improved the message. See also https://github.com/kubernetes/enhancements/pull/1291 KEP for recent discussion and proposal to improve usability in kubectl.
How long events are retained? Their "time-to-live" is apparently controlled by kube-apiserver --event-ttl option, defaults to 1 hour:
https://github.com/kubernetes/kubernetes/blob/da53a247633/cmd/kube-apiserver/app/options/options.go#L71-L72
You can raise this. Might require more resources for etcd — from what I saw in some 2015 github discussions, event TTL used to be 2 days, and events were the main thing stressing etcd...
In a pinch, it might be possible to figure out what happened earlier from various log, especially the kubelet logs?
Saving events
Running kubectl get event -o yaml --watch into a persistent file sounds like a simple thing to do. I think when you watch events as they arrive, you see them pre-dedup.
Heapster can send events to some of the supported sinks:
https://github.com/kubernetes/heapster/blob/master/docs/sink-configuration.md
Eventrouter can send events to various sinks: https://github.com/heptiolabs/eventrouter/tree/master/sinks

Have you checked out the pod specific events tab in the Dashboard?
Some events from a cluster I have running in GKE:

kubernetes/heapster can persist event to gcl and influxdb, but for now there is no api to access stored data

Related

MongoDB Atlas dedicated cluster: Should I be concerned about 'Restarts in last hour' alerts?

We’re using a standard 3-node Atlas replicaset in a dedicated cluster (M10, Mongo 6.0.3, AWS) and have configured an alert if the ‘Restarts in last hour is’ rule exceeds 0 for any node.
https://www.mongodb.com/docs/atlas/reference/alert-conditions/#mongodb-alert-Restarts-in-Last-Hour-is
We’re seeing this alert fire every now and then and we’re wondering what this means for a node in a dedicated cluster and whether this is something to be concerned about, since I don’t think we have any control over it. Should we should disable this rule or increase the restart threshold?
Thanks in advance for any advice.
(Note I've asked this over at the Mongo community support site also, but haven't received any traction yet so asking here too)
I got an excellent response on my question at the Mongo community support site:
A node restarting is not necessarily a cause for concern. However, you should investigate the cause of the restart itself to better determine if this is an issue or not. You should take a look at your Project Activity Feed to see if you can determine why the nodes are restarting. I understand you have noted this is an M10 cluster so you should have access to the MongoDB logs, you also can check those to try determine the cause of the node restart. If you do not have access to the logs, you can consider working with Atlas in-app chat support to diagnose the issue.
It’s always good to keep the alerts active, as they can indicate a potential problem as soon as they occur. You can consider increasing the restart threshold to reduce alert noise after concluding whether the restarts are expected or not.
In my case, having checked the activity feed I was able to match up all the alerts we were seeing to Mongo version auto-updates on the nodes. We still wanted to keep that so we've increased our alert threshold to fire on >1 restart per hour rather than >0 restart, assuming that auto-updates won't be applied multiple times in the same hour.

Does a kubernetes reconcile have to be quick?

I'm writing a controller for a k8s CRD.
The job the controller has to do will usually be quick, but could on occasion take a really long time - let's say as much as an hour.
Is that ok for a Reconcile? Or should I move that work out of the controller into a separate pod, and have the controller monitor the process of that pod?
I see no reason why the reconcile loop couldn't take as long as you need.
Technically speaking a reconcile is just getting a copy of a resource i.e. an HTTP Get or an event if you're using the Watch API, followed by a change to the resource e.g updating the resource Status fields i.e an HTTP PUT/POST.
The only caveat is making sure the resource version you have is still the latest one when trying to change it. Including resource versions in your request should solve this problem.
More info here: https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions

Is it possible to track down very rare failed requests using linkerd?

Linkerd's docs explain how to track down failing requests using the tap command, but in some cases the success rate might be very high, with only a single failed request every hour or so. How is it possible to track down those requests that are considered "unsuccessful"? Perhaps a way to log them somewhere?
It sounds like you're looking for a way to configure Linkerd to trap requests that fail and dump the request data somewhere, which is not supported by Linkerd at the moment.
You do have a couple of options with the current functionality to derive some of the info that you're looking for. The Linkerd proxies record error rates as Prometheus metrics which are consumed by Grafana to render the dashboards. When you observe one of these infrequent errors, you can use the time window functionality in Grafana to find the precise time that the error occurred, then refer to the service log to see if there are any corresponding error messages there. If the error is coming from the service itself, then you can add as much logging info about the request that you need to in order to help solve the problem.
Another option, which I haven't tried myself is to integrate linkerd tap into your monitoring system to collect the request info and save the data for the requests that fail. There's a caveat here in that you will want to be careful about leaving a tap command running, because it will continuously collect data from the tap control plane component, which will add load to that service.
Perhaps a more straightforward approach would be to ensure that all the proxy logs and service logs are written to a long-term store like Splunk, an ELK (Elasticsearch, Logstash, and Kibana), or Loki. Then you can set up alerting (Prometheus alert-manager, for example) to send a notification when a request fails, then you can match the time of the failure with the logs that have been collected.
You could also look into adding distributed tracing to your environment. Depending on the implementation that you use (jaeger, zipkin, etc.) I think the interface will allow you to inspect the details of the request for each trace.
One final thought: since Linkerd is an open source project, I'd suggest opening a feature request with specifics on the behavior that you'd like to see and work with the community to get it implemented. I know the roadmap includes plans to be able to see the request bodies using linkerd tap and this sounds like a good use case for having those bodies.

Why should I store kubernetes deployment configuration into source control if kubernetes already keeps track of it?

One of the documented best practices for Kubernetes is to store the configuration in version control. It is mentioned in the official best practices and also summed up in this Stack Overflow question. The reason is that this is supposed to speed-up rollbacks if necessary.
My question is, why do we need to store this configuration if this is already stored by Kubernetes and there are ways with which we can easily go back to a previous version of the configuration using for example kubectl? An example is a command like:
kubectl rollout history deployment/nginx-deployment
Isn't storing the configuration an unnecessary duplication of a piece of information that we will then have to keep synchronized?
The reason I am asking this is that we are building a configuration service on top of Kubernetes. The user will interact with it to configure multiple deployments, I was wondering if we should keep a history of the Kubernetes configuration and the content of configMaps in a database for possible roll backs or if we should just rely on kubernetes to retrieve the current configuration and rolling back to previous versions of the configuration.
You can use Kubernetes as your store of configuration, to your point, it's just that you probably shouldn't want to. By storing configuration as code, you get several benefits:
Configuration changes get regular code reviews.
They get versioned, are diffable, etc.
They can be tested, linted, and whatever else you desired.
They can be refactored, share code, and be documented.
And all this happens before actually being pushed to Kubernetes.
That may seem bad ("but then my configuration is out of date!"), but keep in mind that configuration is actually never in date - just because you told Kubernetes you want 3 replicas running doesn't mean there are, or if there were that 1 isn't temporarily down right now, and so on.
Configuration expresses intent. It takes a different process to actually notice when your intent changes or doesn't match reality, and make it so. For Kubernetes, that storage is etcd and it's up to the master to, in a loop forever, ensure the stored intent matches reality. For you, the storage is source control and whatever process you want, automated or not, can, in a loop forever, ensure your code eventually becomes reflected in Kubernetes.
The rollback command, then, is just a very fast shortcut to "please do this right now!". It's for when your configuration intent was wrong and you don't have time to fix it. As soon as you roll back, you should chase your configuration and update it there as well. In a sense, this is indeed duplication, but it's a rare event compared to the normal flow, and the overall benefits outweigh this downside.
Kubernetes cluster doesn't store your configuration it runs it, as you server runs your application code.

Where do I submit events on failure of my custom operator?

I'm working on a mysql users operator and I'm somewhat stuck on what's the proper way to report any issues.
The plan is to watch on CRD for MysqlUser and create Secrets and mysql users in the specified DB. Obviously, either of that can go wrong, at which point I need to report an error.
Some k8s object track events in the status.conditions. There's also the Event object, but I've only seen that used by kubelet / controllermanager insofar.
If say, I have a problem creating mysql user because my operator cannot talk to mysql, but otherwise the CRD is valid, should it go to event or to CRD's status?
CRDs do not have a status part yet (1.7). Notifying via events is perfectly fine, that's the reason for having them in the first place.
This sounds similar to events reported from volume plugin (kubelet) where, for example, kubelet is unable to mount a volume from NFS server because server address is invalid, thus can not take to it.
Tracking events in status.conditions is less useful in this scenario since users typically have no control over how kubelet (or operator in your case) interacts with underline resources. In general, status.conditions only signals the status of the object, not why it is in this condition.
This is just my understanding of how to make the choice. I don't know if there is any rules around it.