Prometheus statsd-exporter - how to tag status code in request duration metric (histogram)

Prometheus statsd-exporter - how to tag status code in request duration metric (histogram) - kubernetes

I have setup statsd-exporter to scrape metric from gunicorn web server. My goal is to filter request duration metric only for successful request(non 5xx), however in statsd-exporter there is no way to tag status code in duration metric. Can anyone suggest a way to add status code in request duration metric or a way to filter only successful request duration in prometheus.
In particular I want to extract successful request duration hitogram from statsd-exporter to prometheus.

To export successful request duration histogram metrics from gunicorn web server to prometheus you would need to add this functionality in gunicorn sorcecode.
First take a look at the code that exports statsd metrics here.
You should see this peace of code:
status = resp.status
...
self.histogram("gunicorn.request.duration", duration_in_ms)
By changing the code to sth like this:
self.histogram("gunicorn.request.duration.%d" % status, duration_in_ms)
from this moment you will have metrics names exported with status codes like gunicorn_request_duration_200 or gunicorn_request_duration_404 etc.
You can also modify it a little bit and move status codes to label by adding a configuration like below to your statsd_exporter:
mappings:
- match: gunicorn.request.duration.*
name: "gunicorn_http_request_duration"
labels:
status: "$1"
job: "gunicorn_request_duration"
So your metrics will now look like this:
# HELP gunicorn_http_request_duration Metric autogenerated by statsd_exporter.
# TYPE gunicorn_http_request_duration summary
gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.5"} 2.4610000000000002e-06
gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.9"} 2.4610000000000002e-06
gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.99"} 2.4610000000000002e-06
gunicorn_http_request_duration_sum{job="gunicorn_request_duration",status="200"} 2.4610000000000002e-06
gunicorn_http_request_duration_count{job="gunicorn_request_duration",status="200"} 1
gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.5"} 3.056e-06
gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.9"} 3.056e-06
gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.99"} 3.056e-06
gunicorn_http_request_duration_sum{job="gunicorn_request_duration",status="404"} 3.056e-06
gunicorn_http_request_duration_count{job="gunicorn_request_duration",status="404"} 1
And now to query all metrics except these with 5xx status in prometheus you can run:
gunicorn_http_request_duration{status=~"[^5].*"}
Let me know if it was helpful.

Related

Metrics from spring batch are not pushed to prometheus push gateway

I followed the approaches mentioned in this post. Basically I have my local prometheus and push gateway setup using docker from spring batch examples.
I have the below dependencies added in my build.gradle which means PrometheusPushGatewayManager bean is auto-configured and needs to push metrics to the gateway configured.
implementation("io.micrometer:micrometer-registry-prometheus:1.8.4")
implementation("io.prometheus:simpleclient_pushgateway:0.16.0")
My application.yml looks like below
metrics:
export:
prometheus:
enabled: true
pushgateway:
enabled: true
base-url: http://0.0.0.0:9091
job: main-job
push-rate: 5s
descriptions: true
But when I navigate to /metrics endpoint, the metrics are having values as 0.
example :
spring_batch_step_seconds_max{instance="",job="job",job_name="job-job-flow",name="process-5.csv",status="FAILED"} 0
spring_batch_step_seconds_max{instance="",job="job",job_name="job-job-flow",name="process-6.csv",status="COMPLETED"} 0
spring_batch_step_seconds_max{instance="",job="job",job_name="job-job-flow",name="process-7.csv",status="FAILED"} 0
spring_batch_step_seconds_max{instance="",job="job",job_name="job-job-flow",name="process-2csv",status="FAILED"} 0
spring_batch_step_seconds_max{instance="",job="job",job_name="job-job-flow",name="start-job-job",status="COMPLETED"} 0
I've checked this post, which indicates that we need to configure a registry but if I'm using the auto configured PrometheusPushGatewayManager by adding the simpleclient_pushgateway dependency, how to configure a registry ?
Keeping a breakpoint and viewing the value of Metrics.globalRegistry.meters[1] shows values like SampleImpl{duration(seconds)=392.074203242, duration(nanos)=3.92074203242E11, startTimeNanos=1098399187818886}. So they are captured but not pushed properly.
Am I missing something to configure for getting the metrics pushed properly to the gateway ?

Quarkus service endpoint always returns system info only

I can confirm that the endpoints are working in the unittest through io.restassured.RestAssured. However, after I launched the service, every endpoint always returns a page of system info, e.g.
# HELP kafka_producer_node_request_total The total number of requests sent
# TYPE kafka_producer_node_request_total counter
kafka_producer_node_request_total{client_id="kafka-producer-metric-message-out",kafka_version="2.5.0",node_id="node--1",} 2.0
# HELP kafka_producer_connection_close_total The total number of connections closed
# TYPE kafka_producer_connection_close_total counter
kafka_producer_connection_close_total{client_id="kafka-producer-metric-message-out",kafka_version="2.5.0",} 0.0
# HELP kafka_producer_request_total The total number of requests sent
# TYPE kafka_producer_request_total counter
kafka_producer_request_total{client_id="kafka-producer-metric-message-out",kafka_version="2.5.0",} 2.0
# HELP kafka_producer_node_response_total The total number of responses received
# TYPE kafka_producer_node_response_total counter
kafka_producer_node_response_total{client_id="kafka-producer-metric-message-out",kafka_version="2.5.0",node_id="node--1",} 2.0
# HELP kafka_producer_node_response_rate The number of responses received per second
# TYPE kafka_producer_node_response_rate gauge
From the log I can see that the DBs are connected and schemas are evolved,
but where does such this info come from and why does it hijack my normal endpoints?

What a coincidence, it turned out that my application end point is also /metrics, and the quarkus.http.non-application-root-path=/, and therefore it keeps getting hijacked by quarkus metrics.
Thanks to #loicmathieu.
The solution is to configure the quarkus metrics endpoint:
quarkus.http.non-application-root-path=/
quarkus.smallrye-health.root-path=/quarkus-metrics/health
quarkus.smallrye-health.ui.root-path=/quarkus-metrics/health-ui

How to wait for tekton pipelinRun conditions

I have the following code within a gitlab pipeline which results in some kind of race condition:
kubectl apply -f pipelineRun.yaml
tkn pipelinerun logs -f pipeline-run
The tkn command immediately exits, since the pipelineRun object is not yet created. There is one very nice solution for this problem:
kubectl apply -f pipelineRun.yaml
kubectl wait --for=condition=Running --timeout=60s pipelinerun/pipeline-run
tkn pipelinerun logs -f pipeline-run
Unfortunately this is not working as expected, since Running seems to be no valid condition for a pipelineRun object. So my question is: what are the valid conditions of a pipelineRun object?

I didn't search too far and wide, but it looks like they only have two condition types imported from the knative.dev project?
https://github.com/tektoncd/pipeline/blob/main/vendor/knative.dev/pkg/apis/condition_types.go#L32
The link above is for the imported condition types from the pipeline source code of which it looks like Tekton only uses "Ready" and "Succeeded".
const (
// ConditionReady specifies that the resource is ready.
// For long-running resources.
ConditionReady ConditionType = "Ready"
// ConditionSucceeded specifies that the resource has finished.
// For resource which run to completion.
ConditionSucceeded ConditionType = "Succeeded"
)
But there may be other imports of this nature elsewhere in the project.

Tekton TaskRuns and PipelineRun only use a condition of type Succeeded.
Example:
conditions:
- lastTransitionTime: "2020-05-04T02:19:14Z"
message: "Tasks Completed: 4, Skipped: 0"
reason: Succeeded
status: "True"
type: Succeeded
The different status and messages available for the Succeeded condition are available in the documentation:
TaskRun: https://tekton.dev/docs/pipelines/taskruns/#monitoring-execution-status
PipelineRun: https://tekton.dev/docs/pipelines/pipelineruns/#monitoring-execution-status
As a side note, there is an activity timeout available in the API. That timeout is not surfaced to the CLI options though. You could create a tkn feature request for that.

How to get YARN "Memory Total" and "VCores Total" metrics programmatically in pyspark

I'm lingering around this:
https://docs.actian.com/vectorhadoop/5.0/index.html#page/User/YARN_Configuration_Settings.htm
but none of those configs are what I need.
"yarn.nodemanager.resource.memory-mb" was promising, but it's only for the node manager it seems and only gets master's mem and cpu, not the cluster's.
int(hl.spark_context()._jsc.hadoopConfiguration().get('yarn.nodemanager.resource.memory-mb'))

You can access those metrics from Yarn History Server.
URL: http://rm-http-address:port/ws/v1/cluster/metrics
metrics:
totalMB
totalVirtualCores
Example response (can be also XML):
{ "clusterMetrics": {
"appsSubmitted":0,
"appsCompleted":0,
"appsPending":0,
"appsRunning":0,
"appsFailed":0,
"appsKilled":0,
"reservedMB":0,
"availableMB":17408,
"allocatedMB":0,
"reservedVirtualCores":0,
"availableVirtualCores":7,
"allocatedVirtualCores":1,
"containersAllocated":0,
"containersReserved":0,
"containersPending":0,
"totalMB":17408,
"totalVirtualCores":8,
"totalNodes":1,
"lostNodes":0,
"unhealthyNodes":0,
"decommissioningNodes":0,
"decommissionedNodes":0,
"rebootedNodes":0,
"activeNodes":1,
"shutdownNodes":0 } }
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API
All you need is to figure out your Yarn History Server address and port- check in your configuration files, can't help you with this since I don't know where do you manage Yarn.
When you have the URL, access it with python:
import requests
url = 'http://rm-http-address:port/ws/v1/cluster/metrics'
reponse = requests.get(url)
# Parse the reponse json/xml and get the relevant metrics...
Of course no Hadoop or Spark Context is needed in this solution

Haproxy exporter unable to fetch data

I am using haproxy_exporter in prometheus and add prometheus as a datasource in grafana and the haproxy plugin using prometheus as a datasource in order to fetch haproxy stats and shown in grafana server. And i am not able to get the output from it.
When i run below command, I am getting error invalid URL port.
./haproxy_exporter --no-haproxy.ssl-verify --haproxy.scrape-uri="http://user:$(cat pwfile)192.168.1.10:10000/haproxy/stats;csv"
OUTPUT:
INFO[0000] Starting haproxy_exporter (version=0.9.0, branch=master, revision=0cae8ee3e3f3b7c517db2cc68f386672d8b1b6a7) source=haproxy_exporter.go:495
INFO[0000] Build context (go=go1.10.1, user=root#rlinux57, date=20180724-16:08:06) source=haproxy_exporter.go:496
INFO[0000] Listening on :9101 source=haproxy_exporter.go:521
**ERRO[0013] Can't scrape HAProxy: Get http://admin:abEDokA("192.168.1.10:10000/haproxy/stats;csv: invalid URL port abEDokA("192.168.1.10:10000" source=haproxy_exporter.go:315**
And when i placed # sign between password and IP address, such as ./haproxy_exporter --no-haproxy.ssl-verify --haproxy.scrape-uri="http://admin:abEDokA("#192.168.1.10:10000/haproxy/stats;csv"
It gives below error:
INFO[0000] Starting haproxy_exporter (version=0.9.0, branch=master, revision=0cae8ee3e3f3b7c517db2cc68f386672d8b1b6a7) source=haproxy_exporter.go:495
INFO[0000] Build context (go=go1.10.1, user=root#rlinux57, date=20180724-16:08:06) source=haproxy_exporter.go:496
FATA[0000] parse http://admin:abEDokA("#192.168.1.10:10000/haproxy/stats;csv: net/url: invalid userinfo source=haproxy_exporter.go:500
And my prometheus settings are:
- job_name: 'haproxy'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9101']

You need the # in there and you might need to get rid of the " in your password. Maybe simply escaping it (\") could work, but the second error message suggests haproxy_exporter somehow correctly receives the URL as http://admin:abEDokA("#192.168.1.10:10000/haproxy/stats;csv but is then unable to parse it.
Yup, according to http://www.ietf.org/rfc/rfc1738.txt, " is not a valid character in a URL. You may get around it by using its escape, %22.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Prometheus statsd-exporter - how to tag status code in request duration metric (histogram) - kubernetes

Related

Metrics from spring batch are not pushed to prometheus push gateway

Quarkus service endpoint always returns system info only

How to wait for tekton pipelinRun conditions

How to get YARN "Memory Total" and "VCores Total" metrics programmatically in pyspark

Haproxy exporter unable to fetch data

Categories

Resources