Build Stackdriver Dashboard that contains a filtered list of log entries - elastic-stack

We are evaluating Stackdriver as an alternative to our ELK-stack, I'm missing a few features that I have in kibana (1).
Most important I don't find a way to show the actual logs in a Stackdriver Dashboard, I can only show graphs based on the logs. Changing between two tabs all the time (2 and 3) and adapting the filters on both of them seems very inconvenient for log/error analysis.
Is there a way that I can have a dashboard that also shows logs (based on the filters in the dashboard search)?

There is currently no way to show raw log files in the Metrics Dashboard unfortunately.
You can file a feature request to add this functionality to Stackdriver.

Related

Is the aggregation information in kubernetes dashboard available at the CLI? Or through an API?

I've seen in the Kubernetes dashboard the tracking of some information in the form of:
X happened 14 times and the last occurence was at time T
Where is this data coming from? Can I query for it using the kubectl? Is there a K8s API for this information as well? Or is the dashboard running some kind of aggregation internally?
X happened 14 times and the last occurence was at time T
That is OOB dashboard functionality and most probably you should dive deep into the code to answer this question.
The thing is of course dashboard relies on open data that you can collect on your own using kubectl - the only question is what exactly you wanna see as output. Using kubectl in spike with greps, sorts, seds, etc will give you the same information you just asked. Maybe you wanna create new question and specify your exact task?

Is it possible to track down very rare failed requests using linkerd?

Linkerd's docs explain how to track down failing requests using the tap command, but in some cases the success rate might be very high, with only a single failed request every hour or so. How is it possible to track down those requests that are considered "unsuccessful"? Perhaps a way to log them somewhere?
It sounds like you're looking for a way to configure Linkerd to trap requests that fail and dump the request data somewhere, which is not supported by Linkerd at the moment.
You do have a couple of options with the current functionality to derive some of the info that you're looking for. The Linkerd proxies record error rates as Prometheus metrics which are consumed by Grafana to render the dashboards. When you observe one of these infrequent errors, you can use the time window functionality in Grafana to find the precise time that the error occurred, then refer to the service log to see if there are any corresponding error messages there. If the error is coming from the service itself, then you can add as much logging info about the request that you need to in order to help solve the problem.
Another option, which I haven't tried myself is to integrate linkerd tap into your monitoring system to collect the request info and save the data for the requests that fail. There's a caveat here in that you will want to be careful about leaving a tap command running, because it will continuously collect data from the tap control plane component, which will add load to that service.
Perhaps a more straightforward approach would be to ensure that all the proxy logs and service logs are written to a long-term store like Splunk, an ELK (Elasticsearch, Logstash, and Kibana), or Loki. Then you can set up alerting (Prometheus alert-manager, for example) to send a notification when a request fails, then you can match the time of the failure with the logs that have been collected.
You could also look into adding distributed tracing to your environment. Depending on the implementation that you use (jaeger, zipkin, etc.) I think the interface will allow you to inspect the details of the request for each trace.
One final thought: since Linkerd is an open source project, I'd suggest opening a feature request with specifics on the behavior that you'd like to see and work with the community to get it implemented. I know the roadmap includes plans to be able to see the request bodies using linkerd tap and this sounds like a good use case for having those bodies.

How to add a alert panel in all the Grafana Dashboard

We are using grafana to visualize the influx data. There are multiple dashboard created in. Because of some technical issue there may not be new data in Influx to display in the dashboard because of some downtime.
Is there a possibilities that I can add a panel in all the dashboard with an alert message of the downtime. So that dash board users don't have to go anywhere and notified about the downtime there itself.
Thanks
I think that it's not possible to configure a pop up like you want with grafana.
Find another notification channel (e-mail,discord,slack,...).
If you really want a pop up, this wont be configured in Grafana but in Javascript. To do that, you'll have to custom your Grafana page.For that, i can't help ou.

Kubernetes dashboard: How to access more than 15 minutes of CPU and memory usage in the Web UI dashboard

The web UI dashboard (which is rendered by running 'kubectl proxy') is very useful and gives a great highlevel overview of the cluster. However the CPU and memory usage graphs seems to be hardcoded to display only the last 15 minutes. I am not able to find any settings that allows me to increase this, nor could I find any documentation on how to do this. Our team is exploring setting up grafana/influxdb and other services to get more detailed metrics, but it will be nice if there is an option to increase the timeline to the webui dashboard.

Does it make sense to use ELK to collect page metrics?

We would like to collect some interesting user-related metrics on our website (e.g. "user edited profile", or "user clicked on downloaded file", etc.) and are thinking about using the ELK stack for this.
Is it a good idea to use Elasticsearch to store such events? Or would it make more sense to log them in our RDBMS?
What would be the advantages of using either of those?
(Side note: We already use Elasticsearch and PostgreSQL in our our stack.
You could save your logs in any persistent solution out there and later decide what tool to use for analyzing them.
If you want to do some queries (manage your data on the fly/real-time) you could just directly parse/pipe the logs generated by your applications and send them to elastic search, the flow would be something like:
(your app) --> filebeat --> elasticsearch <-- Kibana
Just keep in mind that the elk stack is not "cheap" and based on your setup could become more expensive to maintain in long term.
At the end depends on your use case, both solutions you mention can be used to store data, but the way you extract/query data is the one that makes the difference.