MongoDB MMS-like alert generation - mongodb

I'm curious how MMS generates alerts on all the metrics, across all the alert configuration across all the groups across all the accounts.
What I would do is query alert configs and active alerts when a ping with new data comes in, and then generate an alert based on the new data.
However, what if some of the metrics can't be determined from the current ping alone, such as page faults avg/sec. This metric is derived from previous pings.
Would there be a background worker of sorts that polls periodically for every alert config in every group in every account? Or would every new ping quickly go get the last minute/5 minute's data for that metric (perhaps only if such an alert configuration exists)?
Also, when no new pings come in, the averages should decay and fire new alerts or update existing ones.
I get the pre-aggregated metrics as shown in this blog post, and even reporting, but how does one link this with alert generation and management.
This seems to be non-trivial to solve.

Related

Create an alert within NetSuite that sends out an email when any of these sales channels have no sales after 2 hours?

I am trying to create an alert on my saved search that will email when any of our sales channels do not have an order created within 2 hours.
This is the results criteria for the saved search
The lead source are the sales channels, and maximum of date created is the last time there was an order created. If it goes past 2 hours I want to be notified via email.
This is not possible to create purely within the UI.
You could create a scheduled script which will load the search, parse the results for any that are older than your threshold, and send the email from the search. This would run periodically depending on your deployment settings. Scheduled scripts can be deployed to run every 15 minutes, so the latest order may be up to ~2:15 old before the alert is sent.
Another approach may be to use a workflow which initiates on record creation and then has a 2 hour delay. Following the delay it could run a search for any newer orders, then if any are found it could simply exit, or if no newer orders are found it could proceed to sending an email. The actual implementation of running the saved search and acting based on the results will probably require a SuiteScript custom workflow action.

What is the difference between an alert and an incident?

I see both the terms "alert" and "incident" being used on PagerDuty. What are the differences? How are they related?
When PagerDuty receives a qualifying event (from a monitoring tool, for example), it triggers an alert, which in turn triggers an incident. Multiple alerts can be aggregated into a single incident for triage, which streamlines incident handoff between teams, centralizes critical information, and reduces notification fatigue. Alerts can move from one incident to another, either manually or via an automated process, such as Alert Grouping. Here is more information on it: https://support.pagerduty.com/docs/alerts

Grafana query when dashboard is not visible

I'm wondering how Grafana fires it's queries (to the datasource).
Does it only fire queries when the dashboard/panel is visible or does it keep on firing queries based on the default time period of all dashboards ?
My assumption was that since Grafana itself doesn't store data, the queries would be on a need basis, but I've been seeing http requests occur periodically from my VM.
This is relevant for metrics such as CloudWatch etc, where each API call can be charged.
Yes, you are correct. Grafana dashboards/panels fire queries only when they are visible (loaded in the browser).
But you may have alert rules in the Grafana and they should be evaluated also periodically. So I guess alerts are source of your queries.

Unable to setup Azure alert on resource specific events

In the past, it was possible to setup an Azure alert on a single event for a resource e.g. on data factory single RunFinished where the status is Failed*.
This appears to have been superseded by "Activity Log Alerts"
However these alerts only seem to either work on a metric threshold (e.g. number of failures in 5 minutes) or on events which are related to the general admin of the resource (e.g. has it been deployed) not on the operations of the resource.
A threshold doesn't make sense for data factory, as a data factory may only run once a day, if a failure happens and then it doesn't happen X minutes later it doesn't mean it's been resolved.
The activity event alerts, don't seem to have things like failures.
Am I missing something?
It it because this is expected to be done in OMS Log Analytics now? Or perhaps even in Event Grid later?
*n.b. it is still possible to create these alert types via ARM templates, but you can't see them in the portal anymore.
The events you're describing are part of a resource type's diagnostic logs, which are not alertable in the same way that the Activity Log is. I suggest routing the data to Log Analytics and setting the alert there: https://learn.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor

How does Fabric Answers send data to the server, should events be submitted periodically or immediately?

I've used Fabric for quite a few applications, however I was curious about the performance when a single application submits potentially hundreds of events per minute.
For this example I'm going to be using a Pedometer application, in which I would want to keep track of the amount of steps users are taking in my application. Considering the average user walks 100 steps per minute, I wouldn't want the application to be sending several dozen updates to the server.
How would Fabric handle this, would it just tell the server "Hey, there were 273 step events in the last 5 minutes with this meta deta" or would it sent 273 step events.
Pedometer applications typically run in the background so how would we get data to Fabric without the user opening the application
Great question! Todd from Fabric. These get batched and sent at time intervals and also certain events (like installs) trigger an upload of the queued events data. You can watch our traffic in Xcode debugger if you are curious about the specifics for your app.