How to connect apache storm with grafana? - grafana

I am collecting data in JSON format that I process in real time with apache storm. I would now like to use grafana to be able to perform real time visualizations on this processed data. Is there a way to connect storm to grafana?
I haven't found much information on the topic, any help would be appreciated

Grafana is a visualization tool that can be set on top of a datastore such as Prometheus / ADX etc. You have first enable collection of these metrics into such a datastore.
Here is one such example https://github.com/wizenoze/storm-metrics-reporter-prometheus
Once this is done , any metrics that are reported from the code (Counter , Gauges , JMX metrics) are then saved on the data store that can then be visualized on Grafana

Related

Apache beam on google dataflow: Collecting metrics from within the main method

I have a batch pipeline which pulls data from a cassandra table and writes into kafka. I would like to get various statistics based on cassandra data . For ex, total no.of records in the cassandra table, no.of records having null value for a column etc. I tried to leverage beam metrics. Though it is showing correct count in the google cloud console after the pipeline has completed execution, I am unable to get it in the main program after pipeline.run() method. It throws unsupported exception. I am using google data flow and bundles the pipeline as flex template. Is there anyway to get this work.
If you can get the job id, dataflow offers a public API that can be used to query metrics which is used internally . Easier might be to get these from Stackdriver, see, e.g. Collecting Application Metrics From Google cloud Dataflow

Are time-related OpenTelemetry metrics an anti-pattern?

When setting up metrics and telemetry for my API, is it an anti-pattern to track something like "request-latency" as a metric (possibly in addition to) tracking it as a span?
For example, say my API makes a request to another API in order to generate a response. If I want to track latency information such as:
My API's response latency
The latency for the request from my API to the upstream API
DB request latency
Etc.
That seems like a good candidate for using a span but I think it would also be helpful to have it as a metric.
Is it a bad practice to duplicate the OTEL data capture (as both a metric and a span)?
I can likely extract that information and avoid duplication, but it might be simpler to log it as a metric as well.
Thanks in advance for your help.
I would say traces and also metrics have own use cases. Traces have usually low retention period (AWS X-Ray: 30 days) + you can generate metrics based on traces for short time period (AWS X-Ray: 24 hours). If you will need longer time period then those queries will be expensive (and slow). So I would say metrics stored in time series DB will be perfect use case for longer time period stats.
BTW: there is also experimental Span Metrics Processor, which you can use to generate Prometheus metrics from the spans directly with OTEL collector - no additional app instrumentation/code.

can graphite or grafana used to monitor pyspark metrics?

In a pyspark project we have pyspark dataframe.foreachPartition(func) and in that func we have some aiohttp call to transfer data. What type of monitor tools can be used to monitor the metrics like data rate, throughput, time elapsed...? Can we use statsd and graphite or grafana in this case(they're prefered if possible)? Thanks.
Here is my solution. I used PySpark's accumulators to collect the metrics(number of http calls, payload sent per call, etc.) at each partitions, at the driver node, assign these accumulators' value to statsD gauge variable, and send these metrics to Graphite server and eventually visualized them in Grafana dashboard. It works so far so good.

Stackdriver throughput metric of Apache Beam streaming job

I have a streaming job implemented on top of Apache Beam, which reads messages from Apache Kafka, processes them and outputs them into BigTable.
I would like to get throughput metrics of ingress/egress inside this job i.e. how many msg/sec the job is reading and how many msg/sec it's writing.
Looking at graph visualization I see that there is throughput metric
e.g. take a look at below exemplary picture for demonstration
However looking at documentation it's not available on Stackdriver.
Is there any existing solution to get this metrics ?
We are looking into publishing a throughput metric to Stackdriver, but one does not currently exist. The ElementCount (element_count in Stackdriver) metric is the only metric available to that UI or through Stackdriver that could be used to measure throughput. If that's displaying on the graph, it must be some computation over that metric. Unfortunately, the metric is exported as a Gauge metric to Stackdriver, so it can't be directly interpreted as a rate in Stackdriver.
A small secondary point, Dataflow doesn't actually export a metric measuring flow into and out external sources. The ElementCount metric measures flow into inter-transform collections only. But as long as your read / write transforms are basically pass throughs, the flow into / out of the adjacent collection should be sufficient.

Using druid graphite emitter extension

I'm trying out the graphite emitter plugin in druid to collect certain druid metrics in graphite during druid performance tests.
The intent is to then query these metrics using the REST API provided by graphite in order to characterize the performance of the deployment.
However, the numbers returned by graphite don't make sense. So, I wanted to check if I'm interpreting the results in the right manner.
Setup
The kafka indexing service is used to ingest data from kafka into druid.
I've enabled the graphite emitter and provided a whitelist of metrics to collect.
Then I pushed 5000 events to the kafka topic being indexed. Using kafka-related tools, I confirmed that the messages are indeed stored in the kafka logs.
Next, I retrieved the ingest.rows.output metric from graphite using the following call:
curl "http://Graphite_IP:Graphite_Port>/render/?target=druid.test.ingest.rows.output&format=csv"
Following are the results I got:
druid.test.ingest.rows.output,2017-02-22 01:11:00,0.0
druid.test.ingest.rows.output,2017-02-22 01:12:00,152.4
druid.test.ingest.rows.output,2017-02-22 01:13:00,97.0
druid.test.ingest.rows.output,2017-02-22 01:14:00,0.0
I don't know how these numbers need to be interpreted:
Questions
What do the numbers 152.4 and 97.0 in the output indicate?
How can the 'number of rows' be a floating point value like 152.4?
How do these numbers relate to the '5000' messages I pushed to
Kafka?
Thanks in advance,
Jithin
As per druid metrics page it indicates the number of events after rollup.
The observed float point value is due to computing the average over a window of time period that the graphite server uses to summarize data.
So if those metrics are complete it means that your initial 5000 rows were compressed to about 250 ish rows.
I figured the issue after some experimentation. Since my kafka topic has multiple partitions, druid runs multiple tasks to index the kafka data (one task per partition). Each of these tasks reports various metrics at regular intervals. For each metric, the number obtained from graphite for each time interval is the average of the values reported by all the tasks for the metric in that interval. In my case above, had the aggregation function been sum (instead of average), the value obtained from graphite would have been 5000.
However, I wasn't able to figure out whether the averaging is done by the graphite-emitter druid plugin or by graphite.