Nagios not sending emails - email

I want to setup nagios to send email notifications.
I can send email notifications manually clicking the "Send custom service notification" in nagios web interface. The notification is being created and the email is being sent and delivered successfully.
But nagios doesn't send notifications automatically. I have tested it turning off PING service on the machine (echo 1 >/proc/sys/net/ipv4/icmp_echo_ignore_all). Nagios sets PING service to CRITICAL state, but doesn't send notification email.
These are my config files:
Part of templates.cfg
define contact{
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send service notifications via email
host_notification_commands notify-host-by-email ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
Part of contacts.cfg
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
email MY-EMAIL#gmail.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
generic-host_nagios2.cfg
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
check_command check-host-alive
max_check_attempts 10
notification_interval 1
notification_period 24x7
notification_options d,u,r,f,s
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
generic-service_nagios2.cfg
define service{
name generic-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_interval 1 ; Only send notifications on status change by default.
is_volatile 0
check_period 24x7
normal_check_interval 5
retry_check_interval 1
max_check_attempts 4
notification_period 24x7
notification_options w,u,c,r,f,s
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
How can I force nagios to send notification emails?

I had a similar issue. It turned out to be a combination of two problems:
1) I was not waiting long enough for the alerts. Add up your normal_check_interval and retry_check_interval*max_check_attempts for services and you'll see that you must wait as long as 9 minutes before getting a notification. Decrease the normal_check_interval and max_check_attempts if you must know about failures of a service faster. Note that with the default Nagios configuration it may be as much as 15 minutes before it will notify you.
2) The default configuration for linux-server is to notify you during workhours, but the server in question was on UTC. Make sure that notification_period variable everywhere is 24x7.
Good luck.

Check the main nagios.cfg file and make sure notifications_enabled=1. Also verify that your base contact template has notifications_enabled=1 as well.

Please also consider the flap_detection_enabled setting. If this setting is enabled and flapping is determined to exist, Nagios will not notify. Flapping is the idea that a service is frequently changing between OK, soft and normal state. During testing, it's often common for a service to "flap" as we test to ensure everything works. Disable the flap_detection_enabled setting in both your host and service config files during testing.

Related

How to make a published queue automatically durable?

I am experimenting with Nats streaming server and it looks quite promising so far.
However it appears queues can only be durable after a durable subscription has been created for it.
This certainly makes sense however how does it work in practice in a microservices architecture?
For instance assume you are publishing services and Service1 is pumping messages out to a queue that is not yet durable and has no listeners. Some time later that corresponding service starts and makes that queue durable. Do you just deal with this hopefully short loss or ensure the later service is started first?
Sorry for the delay. In NATS Streaming, any message published to a channel are stored, regardless of subscription interest. You can experiment and publish say 3 messages on "foo". Then, you can start a subscription (even non durable) and replay those messages. It is just a matter of specifying the starting point of the subscription. For instance, there is an option to have deliver "all available". Using the Go nats samples, it would be:
$ go run examples/stan-pub/main.go foo msg1
Published [foo] : 'msg1'
$ go run examples/stan-pub/main.go foo msg2
Published [foo] : 'msg2'
$ go run examples/stan-pub/main.go foo msg3
Published [foo] : 'msg3'
$ go run examples/stan-sub/main.go -id "me" -all foo
Connected to nats://127.0.0.1:4222 clusterID: [test-cluster] clientID: [me]
Listening on [foo], clientID=[me], qgroup=[] durable=[]
[#1] Received: sequence:1 subject:"foo" data:"msg1" timestamp:1583947471103854000
[#2] Received: sequence:2 subject:"foo" data:"msg2" timestamp:1583947472684693000
[#3] Received: sequence:3 subject:"foo" data:"msg3" timestamp:1583947473990567000

Nagios retry interval when they have OK or UP state

I configure one Linux Host to Nagios Monitoring Server Using NRPE Plugin.
For this I follow the below URL
http://www.tecmint.com/how-to-add-linux-host-to-nagios-monitoring-server/
I have to check some services of Linux Host.
For monitoring linux host and services of that host, I am using nagios log(/usr/local/nagios/var/nagios.log)
First time all good in my nagios log that showing me as below status
SERVICE ALERT: test.testing.local;Service Tomcat;OK;SOFT;6;TOMCAT OK
When my Service status is change to non-OK state than it showing me on log
SERVICE ALERT: test.testing.local;Service Tomcat;CRITICAL;SOFT;4;TOMCAT CRITICAL
But I want that if my Service status is not change to non-OK state than again after 1 minute it show me on log
SERVICE ALERT: test.testing.local;Service Tomcat;OK;SOFT;6;TOMCAT OK
and currently that is not happening.
My services.cfg file content is given below
define service {
host_name test.testing.local
service_description Service Tomcat
check_command check_nrpe!check_service_tomcat
max_check_attempts 10
check_interval 1
retry_interval 1
active_checks_enabled 1
check_period 24x7
register 1
}
I am using Nagios 4.2.2 and CentOS 7.
I think what you are after is from the Nagios 4 Core docs here
check_interval: This directive is used to define the number of "time units" between regularly scheduled checks of the host. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
retry_interval: This directive is used to define the number of "time
units" to wait before scheduling a re-check of the hosts. Hosts are
rescheduled at the retry interval when they have changed to a non-UP
state. Once the host has been retried max_check_attempts times without
a change in its status, it will revert to being scheduled at its
"normal" rate as defined by the check_interval value. Unless you've
changed the interval_length directive from the default value of 60,
this number will mean minutes. More information on this value can be
found in the check scheduling documentation.
If you were to set your check_interval to 1 minute (which is pretty frequent, the default as you can see is 60) you will retry every 1 minute 10 times (max_check_attempts in your config) without a change in status then it will give you an OK/UP state.

Setting hourly email limit to delivery server via pmta config file

I have an email server with pmta. Someone recommended me to follow this link for IP warmup in order to not get blacklisted. I am using mailwizz with 7 IPs.
I tried to set the delivery servers via /etc/pmta/configand changing the config file by adding max-msg-rate 25/h.
I then reset /etc/init.d/pmta restart
I tried again but it is still exceeding the limit
Is there anything I did wrong?
max-msg-rate 25/h is a domain scoped directive, so you would need to apply this to each domain that you send to and <domain *>. I'm not sure about that directive specifically, but some require a reload and not a restart for the configuration change to take effect (specifically adding/changing the admin access IP for the PowerMTA web monitor).
You have to make a macro as the follow:
<domain *>
max-smtp-out 5
max-msg-per-connection 20
max-msg-rate 10000/d
bounce-after 12h
</domain>
For warming up you have to use the following delivery parameter
cold-virtual-mta [virtual-mta-here]
<domain *>
max-cold-virtual-mta-msg 1000/day
</domain>
Regards!

Configuring postfix to only send to a specified domain

In order to not accidentally send real emails to people outside the company from an integration test server, I'd like to configure postfix to only send emails to addresses like *#somecompany.com and drop all other emails. Is it possible to somehow configure it in /etc/postfix/main.cf and if yes then how?
You can specify like that with the help of /etc/postfix/transport file
You can add the line transport_maps = hash:/etc/postfix/transport in main.cf
Do the steps below
Create a transport - transport1 and Mail sent to user "user#gmail.com" should go through transport1 and all other mail sent should go through default.
First stop dual instances of postfix if any.
Open /etc/postfix/main.cf
and set inet to all.
Add the following to master.cf
transport1 unix - - n - 1 smtp
-o smtp_bind_address= (add a space at 1st)
-o syslog_name=postfix-localroute1 (add a space at 1st)
Add/create the following to /etc/postfix/transport
somecompany.com transport1:
Run postmap after defining the transport file.
postmap /etc/postfix/transport
I have defined a transport above. It means all mail to #somecompany.com will go through you specifed in transport and that ip will not b displayed as it is in maillog. Instead it will be shown as postfix-localroute1
Add the following to main.cf
transport_maps = hash:/etc/postfix/transport
Run:postmap /etc/postfix/transport
Reload postfix:postfix reload

Can I open a clustered MQ queue for writing in Perl?

If I have a Websphere MQ queue defined on another queue manager in the cluster, is there a way I can open it for writing using the Perl interface? The code below brings back mqrc 2085.
$messageQ = MQSeries::Queue->new
(
QueueManager => $qMgr,
Queue => $queue,
Options => $openOpt
) or die ">>>ERROR2: Unable to open the queue: $queue\n";
}
Yes! The Perl modules are a thin veneer over the WMQ API and expose all the basic options and most of the really esoteric stuff as well.
When you open a queue, WebSphere MQ performs name resolution on the values you provide for Queue and QMgr names. If you provide both a Queue and a QMgr name then the object reference is fully qualified and WMQ will attempt to open it as named. So if the name you provide is the local QMgr and the clustered queue does not have a locally defined instance, the open will fail with a 2085 Unknown Object Name.
The trick to opening a clustered queue is to provide a null value for the QMgr name. This causes name resolution to check the local QMgr for a queue of the same name, then finding none it checks the cluster repository and resolve the open to the clustered queue. Note that the queue must be advertised to the cluster for this to work. Specifically, the CLUSTER or CLUSNL attribute of the target queue must be non-blank and refer to a cluster that the source QMgr participates in. Similarly, the destination QMgr must also participate in the same cluster as the source QMgr.
Note also that if you specify a QMgr name on the open that is not the local QMgr, then WMQ will attempt to resolve the QMgr name only. If it can resolve a route to that QMgr then it will send the message there. This means that in a cluster you can send a message to any queue on any QMgr so long as you know the fully-qualified name.
Finally, you can define a local alias over a clustered queue. For example if you are on QMGRA and DEF QA(TARGET.QUEUE) TARGQ(TARGET.QUEUE) and then on QMGRB and QMGRC in the same cluster you DEF QL(TARGET.QUEUE) CLUSTER(MYCLUS) then it is possible to open QMGR=QMGRA QUEUE=TARGET.QUEUE and still have it work as expected. Note that the alias is NOT advertised to the cluster but the target queue is. The only issue with this approach is that the first time it is opened the API call may fail if the cluster query takes too long. When I do this in Production, I always use amqsput on the alias ahead of time to make the QMgr query the repository before the actual application opens the queue. Why would you do this? If security is a concern you probably don't want to authorize all apps directly to the cluster XMitQ because, as noted above, they could then put a message onto any queue on any QMgr in the cluster, including SYSTEM.ADMIN.COMMAND.QUEUE. The alias gives you a place to hang authorizations and restrict the user to specific destinations in the cluster.
So short answer, make sure you provide a null QMgr name on the Open call or set up a local alias over the clustered queue. For more about the security aspects of this, see the WMQ Security presentation at http://t-rob.net/links