Showing value of metric in Grafana unified alert - grafana

I have a following query for getting node memory
100 -
(
avg(node_memory_MemAvailable_bytes{job="node-exporter", instance="211.111.81.111:0000"}) /
avg(node_memory_MemTotal_bytes{job="node-exporter", instance="211.111.81.111:0000"})
* 100
)
I want output in my alert message something like this:
Attention, threshold of 50 is passed. Server with ip 211.111.81.111:0000 has 57 of Memory Usage. Investigate the problem
How can I template the text of alert to output both ip of instance and last value of metric?
From what I got in the internet, however it is not working:
Server with ip 211.111.81.111:0000 has {{ $__value }} % of Memory Usage. Investigate the problem. But this is not working

Related

Find min/max/average memory used by pod using promethus

I am trying to find min/max/average memory consumed by particular pod over a time inteval.
Currently I am using
sum(container_memory_working_set_bytes{namespace="test", pod="test1", container!="POD", container!=""}) by (container)
Output -> test1 = 9217675264
For report purpose, I need to find what the min/peak memory used by pod over a time interval ( 6h)
and average too.
You can do that with a range vector (add an [interval] to a metric name/selector) and an aggregation-over-time function:
min_over_time(container_memory_usage_bytes{}[6h])
max_over_time(container_memory_usage_bytes{}[6h])
avg_over_time(container_memory_usage_bytes{}[6h])

How to exclude spikes from SumoLogic alert?

We have SumoLogic alert that happens if more than 10 errors logged in 60 min.
I prefer to have something like: 
if there is a spike and all the errors happen in e.g. 1 minute ( consider as issue has been auto resolved ) do not generate alert.
How can I set such sumoLogic query?
Variances of the requirements :
Logs have clientIp field, and if all errors are reported for the same client, do not generate alert( problem with particular client, not with application)
if more than 10 errors logged in 60 min, send an alert, unless the errors are of type A, but if there are more than 100 errors of type A, send the alert.( log errors of type A are acceptable, unless the number is too big)
if more than 10 errors logged in 60 min, send an alert Only if the last error happened less than 30 min ago(otherwise consider as auto-fixed)
I am not fully sure how is your data shaped, but...
if there is a spike and all the errors happen in e.g. 1 minute ( consider as issue has been auto resolved ) do not generate alert.
This you can solve by aggregating:
| timeslice 1m
| count by _timeslice
| where _count > 1
or similar.
if all errors are reported for the same client, do not generate alert
It sounds like:
| count by _timeslice, clientIp
would do the job.
if more than 10 errors logged in 60 min, send an alert, unless the errors are of type A, but if there are more than 100 errors of type A,
Rough sketch of the query clause would be:
| if(something, 1, 0) as is_of_type_A
| count by is_of_type_A, ...
| where (is_of_type_A = 1 and _count > 100)
OR (is_of_type_A = 0 and _count > 10)
Disclaimer: I am currently employed by Sumo Logic.

How to fix "error in module SensorManager during network initialization" in Castalia

I've been trying to run valueReporting simulation using castalia. I've edited the ini file in ordre to add to a node 2 sensor devices ( weight and blood glucose). However i'm facing the following error running the simulation :
Error in module (SensorManager) SN.node[0].SensorManager (id=10) during network initialization: Model error:
[Sensor Device Manager]: The parameters of the sensor device manager are not initialized correctly in omnet.ini file..
Here's a sample of the omnetpp.ini file. omnetpp.ini
Does anybody have any idea why i'm having this error ? if so, how can i fix it ?
Thank you !
If you searched for the error message The parameters of the sensor device manager are not initialized correctly in omnet.ini file in the code you would find it in SensorManager.cc.
You could then figure out that this error is triggered when any of 9 parameters do not match the number of sensor devices you have on the node. These are the 9 parameters:
SN.node[0].SensorManager.sensorTypes
SN.node[0].SensorManager.corrPhyProcess
SN.node[0].SensorManager.pwrConsumptionPerDevice
SN.node[0].SensorManager.maxSampleRates
SN.node[0].SensorManager.devicesBias
SN.node[0].SensorManager.devicesNoise
SN.node[0].SensorManager.devicesSensitivity
SN.node[0].SensorManager.devicesResolution
SN.node[0].SensorManager.devicesSaturation
You only define the first two correctly in your ini file. All the rest have the default values, which include only one sensor type, you need to include two values for each. You can look at SensorManager.ned to see what are the default values that these parameters take. You can then simply copy these values, or change them according to your needs.
For example devicesNoise default value is "0.1", so for two sensing devices it can be "0.1 0.1"

SB37 JCL error to small a size?

The following space allocation is giving me an sB37 JCL error. The cobol size of the output file is 100 bytes and the lrecl size is 100 bytes. What do you think is causing this error? I have tried increase the size to 500,100 and still get the same error.
Code:
//OUTPUT1 DD DSN=A.B.C,DISP=(NEW,CATLG,DELETE),
// DCB=(LRECL=100,BLKSIZE=,RECFM=FBM),
// SPACE=(CYL,(10,5),RLSE)
Try to increase not only the space, but the volume as well.
Include VOL=(,,,#) in your DD. # is the numbers of values you want to allocate
Ex: SPACE=(CYL,(10,5),RLSE),VOL=(,,,3) - includes 3 volumes.
Additionally, you can increase the size, but try to stay within reasonable limits :)
The documentation for B37 says the application programmer should respond as indicated for message IEC030I. The documentation for IEC030I says, in part...
Probable user error. For all cases, allocate as many units as volumes
required.
...as noted in another answer. However, be advised that the documentation for the VOL parameter of the DD statement says...
If you omit the volume count or if you specify 1 through 5, the system
allows up to five volumes; if you specify 6 through 20, the system
allows 20 volumes; if you specify a count greater than 20, the system
allows 5 plus a multiple of 15 volumes. You can override the maximum
volume count in data class by using the volume-count subparameter. The
maximum volume count for an SMS-managed mountable tape data set or a
Non-managed tape data set is 255.
...so for DASD allocations you are best served specifying a volume count greater than 5 (at least).
//OUTPUT1 DD DSN=A.B.C,DISP=(NEW,CATLG,DELETE),
// DCB=(LRECL=100,BLKSIZE=,RECFM=FBM),
// SPACE=(CYL,(10,5),RLSE)
Try this instead. Notice that the secondary will take advantage of a large dataset whereas without that parameter the largest secondary that makes any sense is < 300. Oh, and if indeed it is from a COBOL program make sure that the FD says "BLOCK 0"!!!!! If it isn't "BLOCK 0" then you might not even need to change your JCL because it wasn't fixed block machine. It was merely fixed and unblocked so the space would almost never be enough. And finally you may wish to revisit why you have the M in the RECFM to begin with. Notice also that I took out the LRECL, the BLKSIZE and the RECFM. That is because the FD in the COBOL program is all you need and putting it in the JCL is not only redundant but dangerous because any change will have to be now done in multiple places.
//OUTPUT1 DD DSN=A.B.C,DISP=(NEW,CATLG,DELETE),
// DSNTYPE=LARGE,UNIT=(SYSALLDA,59),
// SPACE=(CYL,(10,1000),RLSE)
There is a limit of 65,535 tracks per one volume. So if you will specify a SPACE that exceeds that limit - system will simply ignore it.
You can increase this limit to 16,777,215 tracks by adding DSNTYPE=LARGE paramter.
Or you can specify that your dataset is a multi volume by adding VOL=(,,,3)
You can also use DATACLAS=xxxx paramter here, however first of all you need to find it. Easy way is to contact your local Storage Team and ask for one. Or If you are familiar with ISPF navigation, you can enter ISMF;4 command to open a panel
use bellow paramters before hitting enter.
CDS Name . . . . . . 'ACTIVE'
Data Class Name . . *
It should produce a list of all available data classes. Find the one that suits you ( has enougth amount of volume count, does not limit primary and secondary space

Snort not showing blocked/dropped packets

I'm trying to detect ping flood attacks with Snort. I have included the rule
(drop icmp any any -> any any (itype:8; threshold, track by_src, count 20, seconds; msg:"Ping flood attack detected"; sid:100121))
in the Snort's ddos.rule file.
I'm attacking using the command
hping3 -1 --fast
The ping statistics in the attacking machine says
100% packet loss
However, the Snort action stats shows the verdicts as
Block ->0.
Why is this happening?
A few things to note:
1) This rule is missing the value for seconds. You need to specify a timeout value, you currently have "seconds;" You need something like "seconds 5;". Since this is not valid I'm not sure when snort is actually going to generate an alert, which means it may just be dropping all of the icmp packets, but not generating any alerts.
2) This rule is going to drop EVERY icmp packet for itype 8. The threshold only specifies when to alert, not when to drop. So this is going to drop all packets that match and then generate 1 alert per 20 that it drops. See the manual on rule thresholds here.
3) If you do not have snort configured in inline mode, you will not be able to actually block any packets. See more on the information about the three different modes here.
If you just want to detect and drop ping floods you should probably change this to use the detection_filter option, instead of threshold. If you want to allow legitimate pings, and drop ping floods you do not want to use threshold because the way you have this rule written it will block all icmp itype 8 packets. If you use detection_filter you can write a rule that if snort sees 20 pings in 5 seconds from the same source host then drop. Here is an example of what your rule might look like:
drop icmp any any -> any any (itype:8; detection_filter:track by_src, count 20, seconds 5; sid:100121)
If snort sees 20 pings from the same source host within 5 seconds of each other it will then drop and generate an alert. See the snort manual for detection filters here.
With this configuration, you can allow legitimate pings on the network and block ping floods from the same source host.