I want to create Nagios core event handler whenever I stop apache service
Nagios log is generating and seems like it invoking event handler script, but is not executing it.
I am following these documents.
This is logs of nagios:
SERVICE ALERT: tecmint;HTTP load;CRITICAL;HARD;4;connect to address <ip> and port 80: Connection refused
[1607493385] SERVICE EVENT HANDLER: tecmint;HTTP load;CRITICAL;HARD;4;restart-httpd
Why Apache is not starting?
If you want to monitor and restart Apache in a remote server then you need to use SSH or NRPE with NRPE is preferred in this case as it is faster and doesn't require SSH kay pair exchange.
Briefly you would have 1 master Nagios server and 1 or more Nagios agent(s)
The master would run check_nrpe with some arguments to ask agent to check a service and optionally run an event handler (script)
like that
/usr/local/nagios/libexec/check_nrpe -H agent_IP_Address -c command
where is something like check_http which will be installed in Agent as a plugin
Master should have Nagios core installed
Agent should have NRPE agent and libexec installed
as in this manual:
https://assets.nagios.com/downloads/nagiosxi/docs/Installing_The_XI_Linux_Agent.pdf
Command, Hosts, and Services definitions will stay in the master
The script that restart Apache (the event handler) should be in the agent
This is a full reference of how to install and configure NRPE master-agent model
https://assets.nagios.com/downloads/nagioscore/docs/nrpe/NRPE.pdf?__hstc=118811158.538bdb251b7c177fd3206bea46d0e972.1616532961907.1616532961907.1616532961907.1&__hssc=118811158.11.1616532961908&__hsfp=2505829642
Related
I have a jupyter instance running on a remote server in aws.
When I try to access it from my local computer via the browser I always get the following:
I've tried multiple different browsers and it's the same thing.
But it get's even stranger, if I fire up a terminal and just ssh into the remote aws server, nothing else, now all of a sudden I can access the jupyter instance from my local computer via the browser by just visiting the url of the notebook.
Any idea what the heck is going on here?
Here's a more detailed description. We have two machines A (local) and B (remote). On machine B jupyter-lab is installed using conda.
In order to access jupyter-lab from my local machine A I simply start jupyter-lab on machine B on port 80 and then all I have to do is to visit the url public ip/domain name of machine B in the browser of machine A.
No need for ssh tunneling cause machine B has a public ip and a domain name associated with it, e.g. machineB.aws.com:80 points to the jupyter-lab instance running on machine B.
Now the bizarre thing in all this is that visiting the url machineB.aws.com:80 from the browser in machine A always gives the error "The site can't be reached", unless I simply ssh from machine A into machine B, then the site is reachable and the url machineB.aws.com:80 works fine.
Again, no ssh tunneling going on here, simply ssh from A --> B makes the site reachable?
Clarification
This issue is being caused by the fact that I have configured jupyter-lab to run as user service via systemd. According to the wiki This process will survive as long as there is some session for that user, and will be killed as soon as the last session for the user is closed. When #Automatic start-up of systemd user instances is enabled, the instance is started on boot and will not be killed. If I'm not mistaken I have configured the user service to start-up automatically upon each boot of the computer. Which makes me a bit skeptical why process is being killed when no user is logged in?
Here's the systemd unit file configuration for jupyter:
[Unit]
Description=Jupyter Lab
[Service]
Type=simple
ExecStart=/home/user/anaconda3/envs/myenv/bin/jupyter-lab
WorkingDirectory=/home/user/
Restart=always
RestartSec=120
[Install]
WantedBy=default.target
It seems that I might have forgotten to issue the following command:
loginctl enable-linger username, which is necessary for a systemd user process to run without a user session on startup and to keep running even after a user session has been closed. This is mentioned in the wiki and also mentioned here, here and here
I have two questions. My Immediate problem is WAZUH-AGENT never connects to WAZUH-MANAGER
A. That makes me think, While installing Wazuh Manager, where do we provide WAZUH MANAGER IP?
B. I registered Windows and RHEL machines as agents but none of them are able to connect - all agents are NEVER CONNECTED status.
From windows , it is the error . I am using port#1515 and TCP
ERROR: (1216): Unable to connect to 'xx.xxx.105.75': 'A connection
attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because
connected host has failed to respond.'
I even tried changing 1515 to 1519 from Kibana-Wazuh app. And added my Agent IP in white-list, not sure if that matters.
Answering your questions according to the current version of wazuh v3.13.1 as of today:
[A] While installing Wazuh Manager, where do we provide WAZUH MANAGER IP?
In the installation of the manager you don't have to configure any IP unless you are configuring the cluster mode. WAZUH MANAGER IP is necessary to configure it in the agents.
After installing the agent, you have to:
Add the manager's ip address in the configuration file /var/ossec/etc/ossec.conf
<address>MANAGER_IP</address>
Register the agent in the manager. The simplest method is
/var/ossec/bin/agent-auth -m MANAGER_IP
Restart the wazuh agent
systemctl restart wazuh-agent
Once these steps are applied, you should have your agent connected and reporting to the manager.
[B] I registered Windows and RHEL machines as agents but none of them are able to connect - all agents are NEVER CONNECTED status.
After having performed the steps mentioned above, you should have connection of the agents with the manager. If not, then a troubleshooting process must be followed.
Check that the agent has successfully registered in the manager. You can use the command /var/ossec/bin/agent_control -l and see if the manager has the agent registered.
Check that you have a connection to the manager from the agents.
Wazuh uses by default ports 1515/TCP for registration and 1514/UDP for communication. Check that you have a connection through these ports (check firewall rules ...)
To avoid possible problems, check that your manager's version is >= that the agent's version.
Check if there has been an error in /var/ossec/logs/ossec.log file.
I hope this information is helpful to you.
Best regards.
A.You will have to edit ossec.conf file and make sure you have the MANAGER_IP address put it right place.
B.After you complete the section A. and if 1514/1515 ports are opened, you will be seeing your agent on the manager. Do not forget to register your aget to the manager.
I Think there have two steps:
1.To edit ossec.conf in agent. to change the 'MANAGER_IP' to real manager IP. This is very import and it's very easy to forget to edit it.
2.Restart the Agent.
I wish to run the Aria2c RPC server as a daemon so I can schedule download jobs from my own client using the RPC interface. But I want it to run as a daemon at the same time.
If you need an aria2c instance quickly, run the following command:
aria2c --enable-rpc --rpc-listen-all
This command asks aria2c to enable RPC mode (i.e. act as a daemon) and listen to all incoming traffic, which is not ideal for a public-facing server.
You may need to add additional options like --rpc-user and --rpc-passwd (together), or --rpc-secret to run an aria2c more securely.
I have a certain number of hosts running different servers. All of them have nagios plugin installed. I wanted to write a script that would tell me daily if all the instances are up and running.
I tried opsview, but due to certain restrictions, I couldn't go ahead with it. It was then that I decided to use the nagios plugin directly. I thought about NRPE but it would be used to run a plugin remotely (provided you must know the address of the host), but in my case, I want to know if someone added a new server overnight, or some server failed or what all servers are running.
Nagios doesn't do discovery. You configure it with a list of machines and services to check.
Assuming we're talking about cloud servers, AWS can send you a message when a new server is added. See the doc The message can be SNS or SQS. These notifications could be read to rebuild your nagios configuration to match the auto-scale group.
I have WAS MQ 7.1 Server installed in windows. My application running on unix is trying to connect to this server during which it gives the error "MQ Connect failed 2195" in the application logs. On debugging the code , i found it is while connecting to the q manager that it is throwing this error.
I tried to run a netstat on the MQ Server port no. and do a telnet to check if there is any connection being established . But I could not see any connection being established to the q manager .
The possible issues could be
1. Que Manager has not been started
2. Listener not started
3. Initiation queue not started or created or attached(usually optional depending on set up)
4. Listening to the wrong Port or IP
5. Firewall stopping traffic to Port or IP
6. Queue Manager not created on destination
7. Not authorized to Queue manager and/or channel and/or queue
8. Trial MQ copy expired.
9. Wrong Queue manager name
10. Wrong channel name or password or queue or queue type
Have done the following to check if it is working fine .
1. Tried to put a message from windows to the MQ Server which was successful.
2. Gave auth(setmqaut) permission to Request queue that was created.
3. There is no firewall between application and MQ Server.
4. Channel name , q manager , IP and port no.s are correct.
In my windows MQ Setup i have created the Server connection channel and Client connection Channel.
export the mqm lib to SHLIB_PATH.
Added the application user to mqm group and also the windows user through which I had created the MQ Server setup
Had copied the AMQCLCHL.TAB to the unix machine containing the client program.
exported MQCHLTAB to the table filename
exported MQCHLLIB to the path containing the table name
exported the MQSERVER=QMgrName/CHANNEL1/hostname from the client machine.
Please let me know if I am missing anything w.r.t connection of the application(in Unix) to the MQ Server(in Windows).
2195 is MQRC_UNEXPECTED_ERROR. It implies something that should not have happened, even if you set it up wrong. It may well be accompanied with an FDC file in the errors directory. You should raise a PMR with IBM Service.