track a lot of batch jobs (start and end time) - grafana

we have a lot of jobs (for example batch jobs), that are executed each day. Therefor we’d like to have an overview of all jobs.
→ track start time and end time (–> complete runtime).
All of these infos should be available in a visualisation.
Is InflixDB with Grafana a good solution for this or do you recommend another app?

I think InfluxDB and Grafana are really a good starting point to collect data from your services.
You'll also need to integrate some type of metrics library and an exporter in your code.
On Java you could use Micrometer (https://micrometer.io/) and Prometheus.
Here you can find more information about them: https://micrometer.io/docs/registry/prometheus
After having integrated metrics in your code you simply need to configure Grafana to use data from InfluxDB and configure InfluxDB to scrape your metrics endpoint.

Related

Limiting the number of times an endpoint of Kubernetes pod can be accessed?

I have a machine learning model inside a docker image. I pushed the docker image to google container registry and then deploy it inside a Kubernetes pod. There is a fastapi application that runs on Port 8000 and this Fastapi endpoint is public
(call it mymodel:8000).
The structure of fastapi is :
app.get("/homepage")
asynd def get_homepage()
app.get("/model):
aysnc def get_modelpage()
app.post("/model"):
async def get_results(query: Form(...))
User can put query and submit them and get results from the machine learning model running inside the docker. I want to limit the number of times a query can be made by all the users combined. So if the query limit is 100, all the users combined can make only 100 queries in total.
I thought of a way to do this:
Store a database that stores the number of times GET and POST method has been called. As soon as the total number of times POST has been called crosses the limit, stop accepting any more queries.
Is there an alternative way of doing this using Kubernetes limits? Such as I can define a limit_api_calls such that the total number of times mymodel:8000 is accessed is at max equal to limit_api_calls.
I looked at the documentation and I could only find setting limits for CPUs, Memory and rateLimits.
There are several approaches that could satisfy your needs.
Custom implementation: As you mentioned, keep in a persistence layer the number of API calls received and deny requests after it has been reached.
Use a service mesh: Istio (for instance) will let you limit the number of requests received and act as a circuit breaker.
Use an external Api Manager: Apigee will also let you limit and even charge your users, however if it is only for internal use (not pay per use) I definitely won't recommend it.
The tricky part is what you want to happen after the limit has been reached, if it is just a pod you may exit the application to finish and clear it.
Otherwise, if you have a deployment with its replica set and several resources associated with it (like configmaps), you probably want to use some kind of asynchronous alert or polling check to clean up everything related to your deployment. You may want to have a deep look at orchestrators like Airflow (Composer) and use several tools such as Helm for keeping deployments easy.

add record level custom latency metric in kafka streams

I trying to add a specific metric to my kafka-streams application that will measure latency and report in to the jmx.
I'm using StreamsDSL in scala so using the ProcessorAPI for metrics (which I know is possible) will not work for me.
the basic things I would like to understand is:
how to extract specific record properties (i.e headers) to use as part of the metric calculation
How to add the new metric to the metrics reported to the jmx
Thanks!
You will need to fall back to the Processor API to access record metadata like headers and to register custom metrics.
Note thought, that you can mix-and-match the DSL and the Processor API, so it's not necessary to move off the DSL. Instead, you can pluging custom Processors or Transformers via KStream.process() or KStream.transform() (note, that there are multiple "siblings" to transform() that you might want to use instead of the transform()).

Custom metric for vertx

I am using Vertx micrometer (with Prometheus option) and I would like to ask if it's possible to add a custom metric.
I would like to keep track of a number of users with a specific property so I need to call a service (via event bus) to get this number. Finally, I want to include this number in the metrics collected by Prometheus.
Thank you
Michael

How to speed up the Nifi streaming logs to Kafka

I'm new to nifi, trying to read files and push to kafka. From some basic reading, I'm able to do that with the following.
With this flow I'm able to achieve 0.5million records/sec, of size 100kb each. I would like to catch-up to the speed of 2millions/sec. Data from ListFile and FetchFile processors through slitText processors is great. But, getting settled at PublishKafka.
So clearly bottleneck is with the PublishKafka. How do I improve this performance? Should I tune something at Kafka end or with Nifi-PublishKafka end.
Can someone help me with this. Thanks
You can try using Record Oriented processors i.e PublishKafkaRecord_1.0 processor.
So that your flow will be:
1.ListFile
2.FetchFile
3.PublishKafkaRecord_1.0 //Configure with more than one concurrent task
By using this flow we are not going to use SplitText processors and Define RecordReader/Writer controller services in PublishKafkaRecord processor.
In addition
you can also distribute the load by using Remote Process Groups
Flow:
1.ListFile
2.RemoteProcessGroup
3.FetchFile
4.PublishKafkaRecord_1.0 //In scheduling tab keep more than one concurrent task
Refer to this link for more details regards to design/configuring the above flow.
Starting from NiFi-1.8 version we don't need to use RemoteProcessGroup (to distribute the load) as we can configure Connections(relationships) to distribute the load balancing.
Refer to this and NiFi-5516 links for more details regards to these new additions in NiFi-1.8 version.

Kafka Streams - accessing data from the metrics registry

I'm having a difficult time finding documentation on how to access the data within the Kafka Streams metric registry, and I think I may be trying to fit a square peg in a round hole. I was hoping to get some advice on the following:
Goal
Collect metrics being recorded in the Kafka Streams metrics registry and send these values to an arbitrary end point
Workflow
This is what I think needs to be done, and I've complete all of the steps except the last (having trouble with that one because the metrics registry is private). But I may be going about this the wrong way:
Define a class that implements the MetricReporter interface. Build a list of the metrics that Kafka creates in the metricChange method (e.g. whenever this method is called, update a hashmap with the currently registered metrics).
Specify this class in the metric.reporters configuration property
Set up a process that polls the Kafka Streams metric registry for the current data, and ship the values to an arbitrary end point
Anyways, the last step doesn't appear to be possible in Kafka 0.10.0.1 since the metrics registry isn't exposed. Could some please let me know this if is the correct workflow (sounds like it's not..), or if I am misunderstanding the process for extracting the Kafka Streams metrics?
Although the metrics registry is not exposed, you can still get the value of a given KafkaMetric by its KafkaMetric.value() / KafkaMetric.value(timestamp) methods. For example, as you observed in the JMXRporter, it keeps the list of KafkaMetrics from the instantiated init() and metricChange/metricRemoval methods, and then in its MBean implementation, when getAttribute is called, it will call its corresponding KafkaMetrics.value() function. So for your customized reporter, you can apply similar patterns, for example, periodically poll all kept KafkaMetrics.value() and then pipe the results to your end point.
The MetricReporter interface in org.apache.kafka.common.metrics already enables you to manage all Kafka stream metrics in the reporter. So kafka internal registry is not needed.