Zabbix global all trigger dependency - triggers

I have Zabbix server at home, connected via not 100% stable 4G link. And i would like some kind of global trigger dependency (or lets say alert dependency) that i will be not notified about errors while internet not present.
Because most monitored hosts and web services are outside of my network. In cases when there is WAN problems (lte link down etc.) zabbix collects a failed trigger alerts and cannot tries to notify me about them (stores in email queue). After some time internet goes back to norm and i receive PROBLEM and immediately OK messages about every test zabbix couldn't perform during that period. So it ends up in huge email storm after WAN restores.
I know that i can set up dependency on each trigger. But have about ~100 hosts + ~1000 triggers.

zabbix cannot normally handle this situation, look better at nagios/check-mk. there are dependencies between hosts.

Related

How to Confirm PostgreSQL on Ubuntu VM is communicating with External Server for Updates

I have an Ubuntu VM installed on a client's VMware system. Recently, the client's IT informed us that his firewall has been detecting consistent potential port scans to our VM's internal IP address (coming from 87.238.57.227). He asked if this was part of a known package update process on our VM.
He sent us a firewall output where we can see several instances of the port scan, but there are also instances of our Ubuntu VM trying to communicate back to the external server on port 37258 (this is dropped by the firewall).
Based on a google lookup, the hostname of the external IP address is "feris.postgresql.org", with the ASN pointing to a European company called Redpill-Linpro. As far as I can tell, they offer IT consulting services, specializing in open source software (like PostgreSQL, which is installed on our VM). I have never heard of them before though and have no idea why our VM would be communicating with them or vice-versa. I'm also not sure if I'm interpreting the IP lookup information correctly: https://ipinfo.io/87.238.57.227
I'm looking for a way to confirm or disprove that this is just our VM pinging for a standard postgres update. If that's the case I'd like to restrict this behaviour. We would prefer to do these types of updates manually and limit the communication outside of the VM to what is strictly necessary for the functionality of our application.
Update
I sent an email to Redpill's abuse account. They responded quickly saying that the server should not be port scanning anyone and if it appears that way, something is wrong.
The server is part of a cluster of machines that serves apt.postgresql.org among other postgres download sites. I don't think we have anything like ansible or puppet installed that would automatically check for updates but I will look into that to make sure. I'm wondering if Ubuntu reaching out to update the MOTD with the number of available packages would explain why our VM is trying to reach out to the external postgres server?
The abuse rep said in any case there should only be outgoing connections from the VM, not incoming. He asked for some additional info so I will keep communicating with him and try to update this post accordingly
My communication with the client's IT dropped off so I did not get a definitive answer on this, but I'll provide some new details:
I reached out to the abuse email for Redpill-Linpro. He got back to me and confirmed the server corresponding to the detected IP address is part of a cluster that hosts postgres download sites, including apt.postgresql.org. He was surprised to learn we had detected a port scan from their server and seems eager to figure out why that is happening.
He asked if the client IT could pass along some necessary info for them to set up tracking on that server. But the client IT never got back to me. I think he was satisfied that it wasn't malicious and stopped pursuing it.
Here's one of the messages the abuse rep sent me that may be relevant:
That does look a lot like the tcp to the apt download server yes. It's
strange that your firewall reports that many incoming connections, but
they could be fallout from some connection tracking that's not
operating as intended. The timing appears to be matching up more or
less perfectly. And there should definitely not be any ping-back
connections from it.
Since you appear to be using the http version of the server (and not https) bringing the data in cleartext, they should be able to just
dump the TCP connection contents and verify exactly what it does. But
I bet they are going to see a number of http requests initiated by the
apt client that is checking for updates.

How to keep track of the number of clients that are connecting to server

I'm building a software agent that run on a server, this software agent act as a server manager i.e. starting/stoping Docker container, monitoring etc.
This server will host/serve many services, these services are programs running in Docker container, 1 program/service per container.
There may be so many servers and these servers aren't necessary be a high performance server, they ranges from a small VM to high performance computer. Right now, I assume that every service uses HTTP to serve request.
The function that I want to implement in this software agent is tracking the number of clients that are currently connecting (requesting) to server for every service (e.x. server A is processing 500 requests) or specific program is ok (e.x. program A is processing 100 requests, program B is processing 200 request).
I want to know this number because I want to do workload balancing across servers that host the same service.
The following is ideas that I have.
Implementing load balancer/reverse proxy inside this agent (I would use this load balancer https://github.com/nwoodthorpe/Load-Balancer-Golang). This may be the last choice because I think it will use pretty much resources for load balancing.
Letting service programs that are running on server tell agent whenever they start and finish processing request. I simply implement UDP socket server in agent to listen for a datagram that tell unique ID of request (actually, can be anything that help me distinguish specific request that being processed) and status whether is being processed or finish processing.
So, I would like to ask for a suggestion for above approaches, which one is better or how should I implement it? Is there any better approach to do this?

Running a Sensu handler on the client instead of the server

I have the following problem: I am using sensu to monitor some raspberry pis. Im using standalone checks which works just fine. Now sometimes it might happen that one of the pis lost its wifi connection or just gets restarted manual and dhcp fails or for some other reason has no internet connection. The idea is to let the pi check it self for a internet connection and if the check fails it should solve the problem by it self like restarting wifi or reboot the pi.
Of course a simple bash script with a cronjob should do the job but I want to do the check with sensu. The problem is obvious if the check fails i don't have a internet connection and therefore can't send the check result to the sensu server.
Long story short ;) is it possible to implement something like the remediation feature just on the client? So that a handler on the client it self starts the script which should resolve the problem.
I don't think this is possible. Standalone checks are scheduled by the client but the check result us still published to the server. The result is then handled by the handler which resides on the server.
You could write a standalone "check" plugin which monitors the wifi and if it is off then it will turn it on. It isn't using a handler though.

Get all the running instances using nagios

I have a certain number of hosts running different servers. All of them have nagios plugin installed. I wanted to write a script that would tell me daily if all the instances are up and running.
I tried opsview, but due to certain restrictions, I couldn't go ahead with it. It was then that I decided to use the nagios plugin directly. I thought about NRPE but it would be used to run a plugin remotely (provided you must know the address of the host), but in my case, I want to know if someone added a new server overnight, or some server failed or what all servers are running.
Nagios doesn't do discovery. You configure it with a list of machines and services to check.
Assuming we're talking about cloud servers, AWS can send you a message when a new server is added. See the doc The message can be SNS or SQS. These notifications could be read to rebuild your nagios configuration to match the auto-scale group.

websphere MQ explorer on windows

I am a very beginner to WebSphere MQ world and this is what I'm looking for:
I have to create a simple system with 2 Machine (sender and receiver) to share messages on a queue:
PC 1 sender --> Queue --> PC 2 receiver
Both machines are Windows based and actually are on the same physical PC using virtualbox P1 (host) and PC 2 (guest)
Here is what I have done following online guides:
PC 1 sender:
Websphere MQ (full trial) installed
on MQ Explorer:
Queue Manager "QM.01" created
local Queue "Q.01" created with use=Transmission
channel sender "CH.01" created with queue=Q.01 and some doubts on connection which actually is 1414
PC 2 receiver:
only MQExplorer installed
try to create a remote queue manager with sender IP, 1414 port, and
CH.01 channel --> error 2539 (something wrong on PC 1 configuration
try to create a remote queue manager with sender IP, 1414 port, and
default SYSTEM.ADMIN.SVRCONN channel --> error 4036 (something wrong
with account authentication, I tryed to use the same "Adminitrator#PC
1" user. I've also tried to create the remote queue manager on PC 1
itself with the same result)
I suppose my error could be on PC 1 channel, its icon has a yellow or blue triangle and status=trying are not good.
Ps. forgive me if some setting name are not matching the English version, I have to translate them.
Now that I've been able to configure a remote QMgr on client PC I would learn how to write a simple program (maybe in Java) to read from a queue on the remote queue manager.
I've found a few guides but, before starting in Java, I tried to test amqsget and amqsput from command prompt.
There are no problems from the server machine (with Websphere full trial installed) but the console can't recognize the command from the client (with both Websphere client and MQ Explorer installed)
Where are my mistakes, or what passage have I missed?
When you have an application that needs to talk to a QMgr over the network, you create SVRCONN channels such as SYSTEM.ADMIN.SVRCONN. The application using a SVRCONN channel is able to open queues directly and put or get messages from them. There is no need to create a transmission queue or set USAGE=XMITQ in order for client applications to work.
When you have two QMgrs that need to communicate, you connect them using MCA channels. On the sending QMgr, these include SENDER, SERVER and CLUSTER SENDER. On the receiving QMgr there would include RECEIVER, REQUESTOR or CLUSTER RECEIVER channels. Any of the outbound channels (SDR, SVR or CLUSSDR) require a transmission queue.
In the example you described, there is only one QMgr therefore no SDR, SVR or CLUSSDR channel is required. You will need to use a SVRCONN such as SYSTEM.ADMIN.SVRCONN. You did not mention having defined a listener but apparently you did or else you would not have received a 2539 MQRC_CHANNEL_CONFIG_ERROR message. The reason you get 2539 is because you are attempting to connect with a client to a channel designed for QMgr-to-QMgr connections. The 4036 is because the configuration is incorrect.
Delete CH.01 and redefine it as a SVRCONN channel.
Alter Q.01 with USAGE=NORMAL
Configure WMQ Explorer to connect to CH.01.
As Shashi mentioned, take a look at some of the basic docs. These include...
Introduction to WebSphere MQ
Designing a WebSphere MQ architecture
The Quick Beginnings manuals have been broken up but the main sections are indexed here.
You may also wish to review the WMQ Security Lab for V7.1 and earlier posted at T-Rob.net. Although it is a security lab, it comes with scripts that build the lab environment, including SVRCONN and SDR/RCVR channel pairs, as well as an extensively illustrated lab guide.
Thank you for your response,
Following your indication I've understood I don't need two QMgr as I supposed,
but only one on the sending machine.
Therefore I have changed the query usage to normal, deleted the channel and leave other configuration by default:
SYSTEM.ADMIN.SVRCONN channel and LISTENER.TCP on 1414 port are automatically created.
I've tried also to redefine a channel named CH.01 as a SVRCONN channel
(Channel > new > server connection channel; and then choose between SYSTEM.ADMIN.SVRCONN, SYSTEM.AUTO.SVRCONN or SYSTEM.DEF.SVRCONN)
but unfortunately I wasn’t able to "Configure WMQ Explorer to connect to CH.01".
Anyway every attempt I have made to connect from the second PC are now ended with a AMQ4036 error; even if I’ve set in the CH.01 MCA Properties the ID user as my PC administrator and I have enabled the user identification on PC 2 as administrator#PC 1.
What I'm trying to achieve is to replicate an application used by company which receives data from a remote queue.
The queue connection specification given for test are: Server Name/IP, Port and Channel name.
This is the reason why I'm trying to replicate it creating a QMgr on the receiving PC, because when I tried with the default test information on my company machines it worked creating a QMgr with all the test queues avilable.
I'm now on holiday and I can't have more specific information about my company settings but I hope to be able to replicate a configuration like that.
Regards,
Flavio.