MongoDB : Alert when count of records is high - mongodb

I am new to MongoDB and have a table like below
boxname
time_create
box_data
Basically what we are logging here is which box is sending what data and at what time.
Now my requirement is that is to create an alert in system if a box sends more request than a threshold values indicating probably something is wrong or suspicious from that box. I can get the count of records for a box for a time period say 10 mins - but this query is different - each 10 mins check counts for all boxes and check if any exceed more than threshold.
Do I need to do regular polling after 10 mins? Do a program needs to run infinitely to count # and alert - what would be best method to implement the same?
What would be the best mechanism of implementation for such a requirement?

To solve this problem - query scheduler is needed.
That means - every x minutes we need to execute query and check if any box have been matched with threshold criteria.
Scheduler implementation is most based on your current solution architecture (as I am in c# - HangFire will be my selection to implement scheduler).

Related

Is there a way to get estimated time of completion of a currently running Informatica workflow in Infra metadata tables

I am working with this metadata table REP_WFLOW_RUN currently from Infra DB, to get status about workflows. The column run_status_code shows whether this wf is running, succeeded, stopped, aborted etc..
But for my Business use case I also need to report to Business, the estimated time of completion of this particular work flow.
Example: If suppose the workflow generally started at 6:15, then along with this info that work flow has started I want to convey it is also estimated to complete at so and so time.
Could you please guide me if you have any details on how to get this info from Informatica database.
Many thanks in advance.
This is a very good question but no one can answer correctly :)
Now, you can get some logic like other scheduling tool does.
First calculate average time the workflow takes to complete for a successful run. And output should be a decimal value.
select avg(end_time - start_time )*24 avg_time_in_hr, workflow_name
From REP_WFLOW_RUN
Where run_status_code='succeeded'
Group by workflow_name
You can use above value as estimated time to completion for that workflow. Output should be a datetime.
Select sysdate + avg_time_in_hr/24 est_time_to_complete from dual
Now, this value is an estimated figure and not correct value. So on a bad day, if this takes hours, average value will be bad but we cant do much here.
I assumed, your infa metadata is on oracle.

Grafana - Alert based on a minimum of requests

I hope you can help me.
What I have is some metric data I am collecting. I want to send out an alert, when I reach a specific error-rate on these metrics.
To make clear, my data looks something like this:
Timestamp
value (the runtime of a query)
state (error, success)
api-endpoint called
I have a grafana-Board doing some calculations, drops out something linke this:
error-rate
api-endpoint
number of calls made to the api endpoint
Fine for now - as I can read out on my grafana, I am able to send some error-messages/warnings, if the error-rate is too high. Works like a charm. But now comes the point:
If the first two (e.g.) calls to a specific api fail, I will instantly receive an alarm send by my grafana. I do not wan't that!
Is it possible - and if: how? - to alert me ONLY if this specific request was executed at least 5 times? It is no problem if this is a generic alert like "hey, something is wrong!" - but I need to figure out if the request triggering the alarm with 50-100% error-Rate was at least executed a specific amount of time before alarming.
It has to be done based on tags/fields, I do not want to add a single query for all of my 35+ APIs (number growing).
Any Idea anybody?
Using Grafana 8.0
Using InfluxDb 1.8 (with Flux enabled)
Just to make clear, if everyone ever want's to do the same: FluxQL is king - you can use filter functionality in there and only base data on datasets where value count is greater than X.
Yes: FluxQL is damn hot. I love it since a few month.

Can you calculate active users using time series

My atomist client exposes metrics on commands that are run. Each command is a metric with a username element as well a status element.
I've been scraping this data for months without resetting the counts.
My requirement is to show the number of active users over a time period. i.e 1h, 1d, 7d and 30d in Grafana.
The original query was:
count(count({Username=~".+"}) by (Username))
this is an issue because I dont clear the metrics so its always a count since inception.
I then tried this:
count(max_over_time(help_command{job=“Application
Name”,Username=~“.+“}[1w]) -
max_over_time(help_command{job=“Application name”,Username=~“.+“}[1w]
offset 1w) > 0)
which works but only for one command I have about 50 other commands that need to be added to that count.
I tried the:
"{__name__=~".+_command",job="app name"}[1w] offset 1w"
but this is obviously very expensive (timeout in browser) and has issues with integrating max_over_time which doesn't support it.
Any help, am I using the metric in the wrong way. Is there a better way to query... my only option at the moment is the count (format working above for each command)
Thanks in advance.
To start, I will point out a number of issues with your approach.
First, the Prometheus documentation recommends against using arbitrarily large sets of values for labels (as your usernames are). As you can see (based on your experience with the query timing out) they're not entirely wrong to advise against it.
Second, Prometheus may not be the right tool for analytics (such as active users). Partly due to the above, partly because it is inherently limited by the fact that it samples the metrics (which does not appear to be an issue in your case, but may turn out to be).
Third, you collect separate metrics per command (i.e. help_command, foo_command) instead of a single metric with the command name as label (i.e. command_usage{commmand="help"}, command_usage{commmand="foo"})
To get back to your question though, you don't need the max_over_time, you can simply write your query as:
count by(__name__)(
(
{__name__=~".+_command",job=“Application Name”}
-
{__name__=~".+_command",job=“Application name”} offset 1w
) > 0
)
This only works though because you say that whatever exports the counts never resets them. If this is simply because that exporter never restarted and when it will the counts will drop to zero, then you'd need to use increase instead of minus and you'd run into the exact same performance issues as with max_over_time.
count by(__name__)(
increase({__name__=~".+_command",job=“Application Name”}[1w]) > 0
)

JMeter to record results on hourly basis

I have a JMeter project with multiple GET and POST requests and assertions for these. I use Aggregate results and View results tree listeners, but none of these can store results on hourly basis. I tried JMeterPlugins-Standard and JMeterPlugins-Extras packages and jp#gc - Graphs Generator listener, but all of them use aggregated data instead of hourly data. So I would like to get number of successful and failed requests/assertions per hour, maybe a bar chart would be most suitable for this purpose.
I'm going to suggest a non-conventional design-level solution: name your samplers dynamically with hour (or date and hour), so that each hour the name will change, and thus they will appear in different category, i.e.:
The code for such name is:
${__time(dd:hh,)} the rest of sampler name
Such sampler will appear in the following way in Aggregate Report (here I simulated it with minutes/seconds, but same will happen with days/hours, just on larger scale):
Pros and cons of such approach:
Simple, you can aggregate anything by hour, minute, or any other time slice while test is running, and not by analysis after execution.
Not listener-dependant, can be used with pretty much any listener or visualizer
If you want to also have overall stats, it will require to sum up every sub-category. So it alters data, but in the way that it can still can be added back to original relatively easy.
Calculating __time before every sampler will not be unnoticed completely from performance perspective, but I don't think it will add visible overhead to a script.
You could get the same data by properly aggregating JTL or CSV (whichever you use) after execution, so it doesn't provide you with anything that is not possible to achieve using standard methods
Script needs altering to make this happen. if you have 100s of samplers, it's going to take a while. And if you want to change back...
You might want to use Filter Results Tool which has --start-offset and --end-offset parameters, you can "cut" your results file into "interesting" pieces and plot them according to your requirements.
You can install Filter Results Tool using JMeter Plugins Manager
Also be aware that according to JMeter Best Practices you should
Use as few Listeners as possible; if using the -l flag as above they can all be deleted or disabled.
Don't use "View Results Tree" or "View Results in Table" listeners during the load test, use them only during scripting phase to debug your scripts.
You can get whatever information you need from the .jtl results file, you can specify test results location via -l command-line argument
To get summarized results per hour add to your test plan Generate Summary Results:
Generates a summary of the test run so far to the log file and/or standard output
Update interval in jmeter.properties to your needs ,1 hour, 3600 seconds:
summariser.interval=3600
You will get summary per hour of your requests.
You can try with Jmeter backend Listener. It has integration with graphite and Influxdb. After storing the results in these time series database you can display the result in Grafana dashboard. Grafana has its own filtering of showing the results in hourly, monthly, daily basis and so on.

How to reduce throughput in Talend?

I am trying to reduce the throughput to 3 rows/s. I tried searching for this online but didn't find much. Can anybody help?
My current job looks like this:
Or is it possible to limit tHTTPRequests in the component?
For this you will want to use the tSleep component that will introduce a wait time per row.
The wait time is in seconds but you might be able to use a floating point value (eg. 0.3333). Otherwise you'll be limited to 1 row/s.
If you can't use a floating point value in the tSleep configuration and you absolutely need 3 rows per second then you could use a tJavaRow component that passes everything in the input to the output but also uses this snippet of Java code:
Thread.sleep(333);
This will sleep the running thread for 333 milliseconds on each row of data being passed through the component and give you roughly 3 rows per second (minus actual processing time which in this case should be minimal).