Zabbix trigger dependency - triggers

Let's say I have two hosts. One is a firewall and the other one a L2 device.
I set ping check items with different time interval. 1-minute interval for firewall, and 5-minute interval for L2.
And I also set triggers for both of them just to be informed when there is a disconnection or error. And the FW trigger is dependent on L2 trigger.
If there is a problem in FW, I should get a notification of that trigger, and if there is a problem in L2, I should get one notification from L2 trigger only.
However, I am just curious how this works even though their ping check time intervals are not the same.
Regardless the time interval difference, does Zabbix check L2's ping right after it detects any problem in FW so that it determines whether the problem is from L2 or not?

The notification depends on the action steps, it can be sent immediately at problem creation or after N minutes.
The problem creation depends on the triggers creation and dependancy.
The trigger depends on both the interval checks and the trigger condition.
For instance, you can set itemX at 1m and itemY at 5min for ping loss. If you create a trigger that checks only the .last() value then disconnect the cables from X and Y: ideally you'll see a trigger after one minute and the second one after 4 more minutes (and the dependancy will kick in).
It will work even with an average check of the last N minutes etc.
However, I advise not to do anything similar in production unless you want an headache

Related

Temporary adjustment of delay time

I have the following problem which I am unable to solve:
I have a situation where a security point (added as delay) holds every half an hour a 15 min break. After the break, the security guards increase their speed till the queue is shorter than 10pp.
I wanted to model this as follows: a state chart with delay.set_capacity(0) after 30 minutes and delay.set_capacity(1) again after the 15 min break. For the increased speed after the break, I added an additional state with condition: queue.size()>10 and now I want to set the action such that the delay function changes the delay time from exponential (1/10) to exponential (1/5) as long as queue.size()>10.
Anyone experience with which function in the action box to use? Or would you suggest a different function?
Since you are using, or at least want to use a statechart I would suggest the following design, where you have composite states inside the working state to indicate if the security agent is working fast or normal and a message transition to let it move from one state to the next.
It is advised to use a message transition and trigger it as needed instead of a conditional state which gets chected for every change inside the agent since this can be a computational expensive exercise.
I assume you already implemented the correct capacity settings for the different on enter actions for working and breaking
Now you simply need to send the message every time an agent enters the queue and every time it exits the delay block, and of course, see the delay time based on the state of the statechart.
Aee screenshot below.

Make pedestrians divert to another queue if QueueTime Exceeds a preselected Value

Edited Version:
I'm actually modelling an airport check-in terminal. It works fine so far, but additional I'm still trying to implement a function, that allows my pedestrians not to enter the service-queue if the queue time exceeds a preselected value (e.g. already 15 Passengers in the queue) and therefore walks to some kind of backup Service that opens during this busy times.
Here is my approach:
Variable QueueSize returns permanently the actual Number of Passengers in the Queue.
Every time a ped enters the pedservice block CheckInEco, the function waitingTime() starts:
QueueSize = CheckInEco.size();
if (QueueSize > 15) CheckInEco.cancel(ped)
So, as soon as there are more than 15 Agents in the queue, number 16 should bypass and move to an alternate ServiceBlock, which I would connect to the ccl Port of the CheckInEco Service. But when building the model, I get this message: ped cannot be resolved to a variable?
According to Anylogic Help, it should be possible to use this cancel - call, but I'm not really experienced with it.. Maybe, someone can help me out?
You can simply use a select output block to prevent pedestrians from going into the service block if there are more than 16 pedestrians already in.
Your original question had to do with waiting time, you should follow the exact same approach. But with waiting time it gets more complicated since you don't want to take the average waiting time from the start of the simulation.... so you need to decide if you want to take the last 10 minutes, 1 hour etc and do you want to include the current waiting time of agents in the queue. Since this is the the questions anymore I am not going to add it here, perhaps ask a new question if this is still the case.

Anylogic ResourcePoolName.idle() is not update real time?

I use the below command to check availability of ResourcePool:
ResourcePoolName.idle()
ResourcePoolName.busy()
I found that the idle unit (or busy unit) is slowly updated.
When the agent leave the Service block, ResourcePoolName.busy() is still busy and ResourcePoolName.idle() is sill not idle. I need to wait for the agent to enter the next 3rd block so that ResourcePoolName.busy() unit and ResourcePoolName.idle() are updated correctly.
How can we have the idle unit (or busy unit) of the ResourcePool update real time????
Someone recommend me this solution and it work well.
Instead of using A Services Block alone, I uses Services Block + Delay (with delay time = 0). Now when the agent leave the Delay Block, ResourcePoolName.idle() status did updated correctly.

Is it possible to accelerate time in grafana?

Actually what I want to do,
I created dashboards to monitor the alert status in grafana.
I created fake data in my system to simulate my alert situations on these boards. The time of this data covers the range now - now + 12h. In fact, it takes a long time to analyze the alert status in real data. For this reason, I cannot be very flexible on my alert rules. I have to wait until the end of this period to see alert status in the system. (I have many states like this actually.) Grafana creates pending, alerting, and ok states according to the records in my database. Is there a method to quickly verify my tests without waiting for this time?
The main problem is that it is fairly expensive to do in a data source agnostic way. The way worked in Bosun is you would select a time range, and then an interval or a number of queries to run.
Setting both From and To enables testing multiple iterations of the selected alert over time. The number of iterations depends on the setting to the two linked fields Intervals and Step Duration at 3 Changing one changes the other. Intervals will be the number of runs to do even spaced out over the duration of From to To and Step Duration is how much time in minutes should be between intervals. Doing a test over time will populate the Timeline tab 5 which draws a clickable graphic of severity states for each item in the set:
It would then run all those queries with a pool limiting simultaneous queries. For an interval of say 5 minutes, it would run adjacent 5 minute queries.
So this would speed up the alert authoring and testing workflow significantly. But it would best be implemented as a job system. This is because with more expensive queries, or range/interval combination that is a fair amount of runs, it may take a minute or so - so having to wait on an open network connection is less ideal.
So I found I generally used in two modes:
To tweak a specific alert that had fired at some time
To get a general overview of how much the alert rule would trigger for the historical data
For the general over, a larger time range is generally wanted, which means more queries if the interval is kept the same. And with a feature like FOR (Pending), you would have to use the same interval it would actually run at.
So possible, has some limitations, and some care needs to be taken to do it right. But extremely useful in my experience.

Sensu Scheduler Oddness

I run < 24 checks on my systems. Servers are not regularly heavily loaded. Load averages keep well under 1 during normal operation.
I have noticed a re-occurring issue where the check-cpu check would start triggering high load averages on systems where there was no organic cause for high load. Further investigation showed the high load report was actually due to the check-cpu script running in parallel with other checks. Outside of the checks executing, cpu load was fine.
I upgraded from sensu 0.20 to 0.23 and continued to observe the same issue.
We found that a re-start of the sensu-server and sensu-client services would resolve the problem for a period of time (approximately 24 hours) and then it would return.
We theorized at this point, there must be some sort of time-delay in the dispatch / execution of the checks on the host which causes this overlap to eventually occur.
All checks are set to run at an interval of 30 or 60.
I decided to set the interval of the check-cpu check to 83, and the issue has not occurred since. Presumably because the check-cpu check does not coincide with any others, thus not seeing high cpu load during that short moment.
Is this some sort of inherent scheduling issue with sensu? Is it supposed to know how to dispatch checks with adequate spacing, or is this something that should be controlled by the interval parameter?
Thanks!
I have noticed that the checks drift in execution time. i.e they do not run exactly every 30 seconds but every 30.001s or something like that. I guess the drift might be different on different checks. So eventually you will run into the problem that the checks sync up and all run at the same time, causing the problem. Running more checks at regular intervals (30s, 60s etc) will make this problem occur more often. If you want a change to this problem you have to report it to sensu directly. I think they might fix it eventually since they probably want the system to be scalable.