Why Prometheus does not raise error on invisible metric? - kubernetes

We have a custom metric that gets exported only upon some error condition in app
Alert rule use that custom metric that gets registered with rule manager of Prometheus
Why Prometheus does not raise error, when this metric name is queried? Despite the metric is not available in Prometheus yet...

It seems correct that the absence of a signal is not treated as an error.
However, it can cause problems with dashboards and alerting.
See this presentation by one of Prometheus' creators: Best Practices & Beastly Pitfalls

Related

track a lot of batch jobs (start and end time)

we have a lot of jobs (for example batch jobs), that are executed each day. Therefor we’d like to have an overview of all jobs.
→ track start time and end time (–> complete runtime).
All of these infos should be available in a visualisation.
Is InflixDB with Grafana a good solution for this or do you recommend another app?
I think InfluxDB and Grafana are really a good starting point to collect data from your services.
You'll also need to integrate some type of metrics library and an exporter in your code.
On Java you could use Micrometer (https://micrometer.io/) and Prometheus.
Here you can find more information about them: https://micrometer.io/docs/registry/prometheus
After having integrated metrics in your code you simply need to configure Grafana to use data from InfluxDB and configure InfluxDB to scrape your metrics endpoint.

using Pushgateway to report a Summary metric to Prometheus

I'm using Prometheus to monitor an application which is run on a cronjob basis. So, I'm using Pushgateway to make my desired metrics available for Prometheus. One of the metrics is to report how long does a certain task take to finish. Therefore I'm using a Summary to report that. My issue is that I see the same amount reported for each quantile! My understanding was that the reported time for each quantile should be different.
I'm using the followings to observe() the time and to push my metrics to Pushgateway
Summary.labels(myLable).observe(Date.now() - startedAt)
gateway.pushAdd { jobName: 'test' }, (err, resp, body) ->
console.log "Error!!" if err
and here is a screenshot which shows that I'm getting the final time for all quantiles!
I'd appreciate any comments on this!
If you only have one observation, then a Summary's quantiles will be the same. I'm not sure what you're expecting here instead, a gauge would be the more usual way to report this.

Custom metric for vertx

I am using Vertx micrometer (with Prometheus option) and I would like to ask if it's possible to add a custom metric.
I would like to keep track of a number of users with a specific property so I need to call a service (via event bus) to get this number. Finally, I want to include this number in the metrics collected by Prometheus.
Thank you
Michael

How can I get down time of a specific deployment in kubernetes?

I have an use case where I need to collect the downtime of each deployment (if all the replicas(pods) are down at the same point of time).
My goal is to maintain the total down time for each deployment since it was created.
I tried getting it from deployment status, but the problem is that I need to make frequent calls to get the deployment and check for any down time.
Also the deployment status stores only the latest change. So, I will end up missing out the changes that occurred in between each call if there is more than one change(i.e., down time). Also I will end up making multiple calls for multiple deployments frequently which will consume more compute resource.
Is there any reliable method to collect the down time data of an deployment?
Thanks in advance.
A monitoring tool like prometheus would be a better solution to handle this.
As an example, below is a graph from one of our deployments for last 2 days
If you look at the blue line for unavailable replicas, we had one replica unavailable from about 17:00 to 10:30 (ideally unavailable count should be zero)
This seems pretty close to what you are looking for.

Prometheus : Text format parsing error in line 1696: invalid metric name in comment

I am trying to use promethues for reporting executes Hystrix commands for mongo db,Everything is working fine except promethues not able to understand below line and shows the target state as down.
# HELP rideshare-engine_hystrix_command_latency_total_percentile_995 DEPRECATED: Rolling percentiles of execution times for the end-to-end execution of HystrixCommand.execute() or HystrixCommand.queue() until a response is returned (or ready to return in case of queue(). The purpose of this compared with the latency_execute* percentiles is to measure the cost of thread queuing/scheduling/execution, semaphores, circuit breaker logic and other aspects of overhead (including metrics capture itself).
complete stack
Not sure what Am a doing wrong here
Config:
Hypens are not valid in Prometheus metric names, use underscore instead.