pagerduty allow time to self resolve before alerting - pagerduty

Is there an easy way to setup pagerduty to hold alerts to allow time for the cloud to self resolve?
my team often gets woken up for alerts that self resolve before we have a chance to address it. If Pagerduty could hold the alert for 5 minutes we would avoid unactionable alerts.

You can use Event Orchestration or Event Rules at a service level to control this. What you are looking for is to create an alert but pause notifications. This will suspend the alert for a time period of your choice, allowing the alert to resolve itself. If the alert doesn't resolve within the time period an incident will open as expected.
Event Orchestration (should be available on basic+ tiers)
Once you define the conditions for the Orchestration rule you want to set an incident action to pause notifications for 300 seconds (5 minutes).
https://support.pagerduty.com/docs/event-orchestration#incident-actions
If you have the event intelligence package you can also look at the auto-pause feature which detects and pauses transient alerts for you.

Related

Wind Turbine Maintenance tutorial model - simulation AnyLogic

I have now set up the model and all is working. I was just wondering if there is somehow to prevent downtime by editing the scheduled event or even adding an extra scheduled event.
https://anylogic.help/tutorials/turbine-maintenance/1-different-types-of-agents.html
The link to the tutorial.
I guess I need to add an event and connect it to the failure rate somehow. Because right now the failure rate is static with a failure rate of 1/MTTF where MTTF = 50 days.
So, the question is how do I add an event to prevent downtime where it is connected with the failure rate in someway.

How to get CloudWatch to send an alarm every time a threshold has been breached?

Currently using this script to monitor:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html
When the script is first run we receive the email, but after that the email never sends (even though the threshold has been breached continuously).
Amazon CloudWatch alarms will only trigger notifications when the state of the Alarm changes. It will not continuously send alarms when the state is Alert and it is not possible to configure such behavior.
One exception to this is triggering Auto Scaling changes -- it will continually try to trigger an Auto Scaling policy while the state is in Alarm.
This is an example of a custom metrics that reset itself every hour: Custom CloudWatch metrics with hourly reset. Therefore, the notifications are also repeated every hour, if the alarm has not been resolved.

Sending Reminders for Tasks

I have recently been thinking about possible architecture for a simple task reminder system. User will schedule a task and reminder in form of SMS/email/android needs to be sent to all stakeholders at some x minutes before the task is scheduled to be performed(much in the same way google calendar works). The problem here is to send the reminder at that precise point in time. Here are the two possible approaches I can think of:
Cron: I can setup a cron to run every minute. This will scan the table for notifications which need to be sent in the next minute and simply sends the notifications. But, precision is lost as there is always the chance of that +/-1 min error.
Work Queues: I can simply put a message with appropriate delay in a queue at the time task was scheduled. Workers will send the notification as and when they receive the message. I can add as many workers as I want in case my real time behavior starts getting affected because of load. There are still a few issues. How to choose the appropriate work queue? I have evaluated RabbitMq and Beanstalk. While Rabbitmq follows standard AMQP protocol and is widely suggested, it doesn't provide the delay functionality out of the box. There are ways to simulate this using dead-letter-exchanges but this will not work in my case because the delay needs to be variable. Beanstalk supports this but the problem is that beanstalk queue resides entirely in memory which I don't like(but can live with). Any possible alternatives?
Third Approach: ??????. I am sure a simple desktop notification tool does neither of the two. What technology do they use to achieve the same thing?
We had the same scenario and we use Redis for long schedules even now reminders for up to 2 years. You can use Sorted Set where the timestamp is the score.
We use Beanstalkd delay jobs for those kind of reminders where we know it's relatively short term couple of hours, and there is no cancellations, as removing from beanstalkd a delayed message you need to retain the job id in a database for later removal, and that is no viable.
Although you mention memory limit, we use persistence on both Redis/Beanstalkd

Avoiding Local Push notification to fire after changing time

In my app, I want to send a Local Push notification every 30 minutes. One way is to just configure local push notification and fire it. However there is a possibility that user can change his time and move forward 30 minutes. In this way a cheat can be done.
I want to configure my app so that notification only occur after 30 minutes. How can I do that. My app does communicate with server and can get its timestamp but I want to do things which don't use much server resources.
The only way I can think of to detect a user altering the system clock is as follows:
When app launches, ask your server the time and note the difference between that and [NSDate date]. Persist that as [NSNumber numberWithFloat:serverOffset];
Implement a method like - (BOOL)deviceClockChanged that asks the server again and compares to the persisted value. If the difference is greater than some small tolerance for clock drift + latency on the synch request, then you can conclude that the clock was changed. Do all this in UTC so it works independent of user travels between time zones.
Consider this: if the user wants badly enough to fool your app about the time in order to delay a notification, messing up the rest of his phone, maybe you ought to just let him edit the notification schedule.
I can supply code examples for points 1 and 2 if you want, and if you want I can supply some #"alert text" for point 3 that will make the user feel really guilty about editing his notifications.
My original answer here. If you choose to let user edit notifications, these methods will be key... UIApplication has a property:
NSArray *scheduledLocalNotifications;
and implements:
- (void)cancelLocalNotification:(UILocalNotification *)notification
So to change one, cancel it, then reschedule it.

Time based Sagas with Event Sourcing

Let's say I wanted to have a saga that get's created by some event, then sits and wait for a few hours, and if nothing happens, sends off some command.
Now, if this Saga was all in-memory and I had to restart the app/server, the saga would be unloaded and never seen again, right?
Would I use Event Sourcing to bring this Saga up to speed once the system is back online?
If so, I would need pretty much a separate Event Store with "active sagas" that can be replayed at system startup, to get my Sagas up to speed. So far it seems good to me, but how would I implement the timeout?
I would need some way of "faking" the timeouts at replay, taking into account there may be several, subsequent timeouts depending on the events going into the saga.
The best way to achieve this capability is with another endpoint that is capable of returning a message back to you at a certain point in time. For example, your saga may dispatch a message to this "timeout manager" and say wake me in 1 hour or 1 day or even 1 year. The message would then be returned to you at that time. Ideally this message would have business meaning that would cause an action to occur.
Perhaps the best example of this is something like customer signup where, if the customer hasn't confirmed their account within 7 days from signup, you'd notify them via email. The "timeout message" would effectively be: RemindUserToConfirmAccountMessage. When this message is received back by the saga after 7 days, the saga would determine based upon its current state, if that message needs to be handled and a customer email needs to be sent. But if the user has already confirm his/her account, the message can be discarded with no action taken.