how to monitor crond service in nagios - centos

I want to monitor crond service in Nagios.
I tried using creating below script and put at /usr/local/nagios/libexec/
CRON_RESULT=$(/etc/init.d/crond status)
STATUS=`echo $CRON_RESULT| grep pid`
if [ -z "$STATUS" ]; then
echo "CROND CRITICAL- $CRON_RESULT"
exit 2
else
echo "CROND OK- $CRON_RESULT"
exit 0
fi
It is working fine if no cron is running (shows CRITICAL) & if running (shows OK).
But if NRPE client is not reachable from Nagios Server, crond service shows the status as "OK" (in green color) with a message return code of 255 is out of bounds. And I can't get to know that crond is running or not.
Is there any other way to monitor crond service in CentOS 6.6?

From your libexec directory on your Nagios server, manually execute the check_nrpe command against the IP address of the host which contains the crond service you want to monitor:
[root#joeyoung.io libexec]# pwd
/usr/local/nagios/libexec
[root#joeyoung.io libexec]# ./check_nrpe -H 10.0.0.1
connect to address 10.0.0.1 port 5666: No route to host
Immediately after running check_nrpe, execute echo $? to get the return code.
[root#joeyoung.io libexec]# echo $?
255
If you get a result of 255, (or any number other than 0), then the error message you received manually executing check_nrpe is a symptom of the root cause of your issues.
Can you try running this and reporting back?

I searched and find that it is bug in nrpe v2.15 so I reverted back to nrpe v2.12 on Nagios Server and it resolved the issue.

Related

Bind for 0.0.0.0:50000 failed: port is already allocated on MacOS

I initially ran jenkins in a docker container through my MacOS terminal successfully after running docker-compose up which generated the long admin password cypher. However after I restarted my machine, the setup vanished. But each time I run docker-compose up after exposing jenkins port 8080 on port 8082 and Jira port 50000 on port 200000 having tried exposing them externally on other ports previously, I keep getting the error below:
**Creating jenkins ... error
ERROR: for jenkins Cannot start service jenkins: driver failed programming external connectivity on endpoint jenkins (****************************************************): Bind for 0.0.0.0:20000 failed: port is already allocated
ERROR: for jenkins Cannot start service jenkins: driver failed programming external connectivity on endpoint jenkins (****************************************************): Bind for 0.0.0.0:20000 failed: port is already allocated**
I have stopped, killed and removed all containers, removed all images and pruned all networks, but nothing seems to work.
What's a way around this and how do I free up allocated ports?
You can find the process that is running on port 20000 using:
lsof:
lsof -nP -iTCP -sTCP:LISTEN | grep <port-number>
or
netstat:
netstat -anv | grep <port-number>
It is probably just an old process that stays as zombie. Just kill that process (you can use kill -9 <pid>) and try the same operation again.

Busybox wget strange behaviour

I need some help with a really strange problem.
I use wget (busybox) to obtain the the IP address of some remote clients, to use it on a DNS (a sort of "homemade ddns"). Those clients run a script that every 5 mins calls
wget -O /dev/null "https://my_dns.org/poll.php?user=User_N&pwd=password_N"
Everything was fine, until I updated my http server to remove TLS1.0/TLS1.1
After updating: running the above command on the clients' console it still works OK, while running it automatically (launching it from a script in /etc/init.d) I get this error:
Connecting to my_dns.org (www.xxx.yyy.zzz:443)
wget: error getting response: Connection reset by peer
...Any idea about why does this happen, and how to fix...?
(The shell on the clients runs as root...)
Thank you in advance for your help
Regards

Rye::Box commands failing on remote server

Firstly, I can ssh into the remote server and execute the following commands
cd public_html
du -sh
each successful & exiting with code 0.
Automating the process with Rye::Box & with option safe: false
rbox.cd :public_html
does change directory but also returns exit code -1
rbox.execute 'du -sh'
fails with error message "SocketError::getaddrinfo: Name or service not known"
Would appreciate an explanation if possible.
Check your hosts entry for 127.0.0.1
You might have to add a hostname in /etc/hosts for 127.0.0.1.
A similar question addresses this issue on SO.
See also
SocketError (getaddrinfo: Name or service not known) - Sunspot/Solr Rails development

running service elasticsearch start fails, but running the command manually succeeds

Context:
I'm testing an elasticsearch 1.7.1 configuration that's set up by chef, and testing in kitchen
The chef script and configuration works because it's running in production somehow
running service elasticsearch start as the elasticsearch user fails, but the actual call it delegates to does not.
From what I've learned, chef scripts are run as root. So, when the test fails (it checks to see if elasticsearch is running by running service elasticsearch status), I log into the vagrant machine. As root, if I run service elasticsearch start, I get an OK (which is incorrect, but another issue) and then run a subsequent service elasticsearch status, I'm met with the error: elasticsearch dead but pid file exists
Digging further, I set debug statements on the init.d script that's run by service and saw that the actual command was basically a call to the init.d/functions function daemon, which just calls:
runuser -s /bin/bash elasticsearch -c 'ulimit -S -c 0 >/dev/null 2>&1 ; /usr/share/elasticsearch/bin/elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch/ -Des.default.path.data=/data/elasticsearch/data/ -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch/'
So I tried a sudo su - elasticsearch and then ran the part in quotes:
[elasticsearch#default-centos ~]$ ulimit -S -c 0 >/dev/null 2>&1 ;
/usr/share/elasticsearch/bin/elasticsearch
-p /var/run/elasticsearch/elasticsearch.pid -d
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/var/log/elasticsearch/
-Des.default.path.data=/data/elasticsearch/data/
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch/
A subsequent service elasticsearch status shows that elasticsearch is running just fine! I've even set the logging to TRACE, and there's no indication that elasticsearch has crashed.

How can I wait for a docker container to be up and running?

When running a service inside a container, let's say mongodb, the command
docker run -d myimage
will exit instantly, and return the container id.
In my CI script, I run a client to test mongodb connection, right after running the mongo container.
The problem is: the client can't connect because the service is not up yet.
Apart from adding a big sleep 10in my script, I don't see any option to wait for a container to be up and running.
Docker has a command wait which doesn't work in that case, because the container doesn't exist.
Is it a limitation of docker?
Found this simple solution, been looking for something better but no luck...
until [ "`docker inspect -f {{.State.Running}} CONTAINERNAME`"=="true" ]; do
sleep 0.1;
done;
or if you want to wait until the container is reporting as healthy (assuming you have a healthcheck)
until [ "`docker inspect -f {{.State.Health.Status}} CONTAINERNAME`"=="healthy" ]; do
sleep 0.1;
done;
As commented in a similar issue for docker 1.12
HEALTHCHECK support is merged upstream as per docker/docker#23218 - this can be considered to determine when a container is healthy prior to starting the next in the order
This is available since docker 1.12rc3 (2016-07-14)
docker-compose is in the process of supporting a functionality to wait for specific conditions.
It uses libcompose (so I don't have to rebuild the docker interaction) and adds a bunch of config commands for this. Check it out here: https://github.com/dansteen/controlled-compose
You can use it in Dockerfile like this:
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/ || exit 1
Official docs: https://docs.docker.com/engine/reference/builder/#/healthcheck
If you don't want to expose the ports, as is the case if you plan to link the container and might be running multiple instances for testing, then I found this was a good way to do it in one line :) This example is based on waiting for ElasticSearch to be ready:
docker inspect --format '{{ .NetworkSettings.IPAddress }}:9200' elasticsearch | xargs wget --retry-connrefused --tries=5 -q --wait=3 --spider
This requires wget to be available, which is standard on Ubuntu. It will retry 5 times, 3 seconds between tries, even if the connection is refused, and also does not download anything.
If the containerized service you started doesn't necessarily respond well to curl or wget requests (which is quite likely for many services) then you could use nc instead.
Here's a snippet from a host script which starts a Postgres container and waits for it to be available before continuing:
POSTGRES_CONTAINER=`docker run -d --name postgres postgres:9.3`
# Wait for the postgres port to be available
until nc -z $(sudo docker inspect --format='{{.NetworkSettings.IPAddress}}' $POSTGRES_CONTAINER) 5432
do
echo "waiting for postgres container..."
sleep 0.5
done
Edit - This example does not require that you EXPOSE the port you are testing, since it accesses the Docker-assigned 'private' IP address for the container. However this only works if the docker host daemon is listening on the loopback (127.x.x.x). If (for example) you are on a Mac and running the boot2docker VM, you will be unable to use this method since you cannot route to the 'private' IP addresses of the containers from your Mac shell.
Assuming that you know the host+port of your MongoDB server (either because you used a -link, or because you injected them with -e), you can just use curl to check if the MongoDB server is running and accepting connections.
The following snippet will try to connect every second, until it succeeeds:
#!/bin/sh
while ! curl http://$DB_PORT_27017_TCP_ADDR:$DB_PORT_27017_TCP_PORT/
do
echo "$(date) - still trying"
sleep 1
done
echo "$(date) - connected successfully"
I've ended up with something like:
#!/bin/bash
attempt=0
while [ $attempt -le 59 ]; do
attempt=$(( $attempt + 1 ))
echo "Waiting for server to be up (attempt: $attempt)..."
result=$(docker logs mongo)
if grep -q 'waiting for connections on port 27017' <<< $result ; then
echo "Mongodb is up!"
break
fi
sleep 2
done
Throwing my own solution out there:
I'm using docker networks so Mark's netcat trick didn't work for me (no access from the host network), and Erik's idea doesn't work for a postgres container (the container is marked as running even though postgres isn't yet available to connect to). So I'm just attempting to connect to postgres via an ephemeral container in a loop:
#!/bin/bash
docker network create my-network
docker run -d \
--name postgres \
--net my-network \
-e POSTGRES_USER=myuser \
postgres
# wait for the database to come up
until docker run --rm --net my-network postgres psql -h postgres -U myuser; do
echo "Waiting for postgres container..."
sleep 0.5
done
# do stuff with the database...
If you want to wait for an opened port, you can use this simple script:
until </dev/tcp/localhost/32022; do sleep 1; done
For wait until port 32022 be able to connect.
I had to tackle this recetly and came up with an idea. When doing research for this task I got here, so I thought I'd share my solution with future visitors of this post.
Docker-compose-based solution
If you are using docker-compose you can check out my docker synchronization POC. I combined some of the ideas in other questions (thanks for that - upvoted).
The basic idea is that every container in the composite exposes a diagnostic service. Calling this service checks if the required set of ports is open in the container and returns the overall status of the container (WARMUP/RUNNING as per the POC). Each container also has an utility to check upon startup if the dependant services are up and running. Only then the container starts up.
In the example docker-compose environment there are two services server1 and server2 and the client service which waits for both servers to start then sends a request to both of them and exits.
Excerpt from the POC
wait_for_server.sh
#!/bin/bash
server_host=$1
sleep_seconds=5
while true; do
echo -n "Checking $server_host status... "
output=$(echo "" | nc $server_host 7070)
if [ "$output" == "RUNNING" ]
then
echo "$server_host is running and ready to process requests."
break
fi
echo "$server_host is warming up. Trying again in $sleep_seconds seconds..."
sleep $sleep_seconds
done
Waiting for multiple containers:
trap 'kill $(jobs -p)' EXIT
for server in $DEPENDS_ON
do
/assets/wait_for_server.sh $server &
wait $!
done
Diagnostic srervice basic implementation (checkports.sh):
#!/bin/bash
for port in $SERVER_PORT; do
nc -z localhost $port;
rc=$?
if [[ $rc != 0 ]]; then
echo "WARMUP";
exit;
fi
done
echo "RUNNING";
Wiring up the diagnostic service to a port:
nc -v -lk -p 7070 -e /assets/checkports.sh
test/test_runner
#!/usr/bin/env ruby
$stdout.sync = true
def wait_ready(port)
until (`netstat -ant | grep #{port}`; $?.success?) do
sleep 1
print '.'
end
end
print 'Running supervisord'
system '/usr/bin/supervisord'
wait_ready(3000)
puts "It's ready :)"
$ docker run -v /tmp/mnt:/mnt myimage ruby mnt/test/test_runner
I'm testing like this whether the port is listening or not.
In this case I have test running from inside container, but it's also possible from outside whether mongodb is ready or not.
$ docker run -p 37017:27017 -d myimage
And check whether the port 37017 is listening or not from host container.
You can use wait-for-it, "a pure bash script that will wait on the availability of a host and TCP port. It is useful for synchronizing the spin-up of interdependent services, such as linked docker containers. Since it is a pure bash script, it does not have any external dependencies".
However, you should try to design your services to avoid these kind of interdependencies between services. Can your service try to reconnect to the database? Can you let your container just die if it can't connect to the database and let a container orchestrator (e.g. Docker Swarm) do it for you?
Docker-compose solution
After docker-compose I dont know name of docker container, so I use
docker inspect -f {{.State.Running}} $(docker-compose ps -q <CONTAINER_NAME>)
and checking true like here https://stackoverflow.com/a/33520390/7438079
In order to verify if a PostgreSQL or MySQL (currently) Docker container is up and running (specially for migration tools like Flyway), you can use the wait-for binary: https://github.com/arcanjoaq/wait-for.
For mongoDB docker instance we did this and works like a charm:
#!/usr/bin/env bash
until docker exec -i ${MONGO_IMAGE_NAME} mongo -u ${MONGO_INITDB_ROOT_USERNAME} -p ${MONGO_INITDB_ROOT_PASSWORD}<<EOF
exit
EOF
do
echo "Waiting for Mongo to start..."
sleep 0.5
done
Here is what I ended up with which is similar to a previous answer just a little more concise,
until [[ $(docker logs $db_container_name) == *"waiting for connections on port 27017"* ]]
do
echo "waiting on mongo to boot..."
sleep 1
done
1 : A container attached to a service with docker-compose doesn't launch when a Synology NAS starts up.
I had a problem launching a docker container on a Synology NAS that was attached to another container via docker-compose like this:
...
---
version: "3"
services:
gluetun:
image: qmcgaw/gluetun
container_name: gluetun
...
qbittorrent:
image: lscr.io/linuxserver/qbittorrent:latest
container_name: qbittorrent
# Connect the service to gluetun
network_mode: "service:gluetun"
...
The docker used by Synology is different or not up to date and apparently does not appreciate that a container is attached to another container with network_mode, the Synology docker considers that the container is not attached to any network and therefore can not launch the container. However in command line it works very well so I wanted to make a script to launch it automatically at the startup of my NAS by a scheduled task.
note : I creat my docker compose with portainer
2 : The until loop does not work even with all the different ways of writing the condition.
If like me on your Synology NAS you did not manage to make the until loop work as described superhero : here you will have to go through the while loop.
however with the -x argument of bash to debug my code the String comparison was well done:
output line (same with all ways of describing the expression):
...
+ [' false = true ']'
...
No matter what the result, nothing worked, I checked every time and there was always a moment when it did not work as I wanted.
4: THE SOLUTION FOR SYNOLOGY
Environment
DSM : 7.1.1
bash : 4.4.23
docker : 20.10.3
After finding the right syntax, we had to solve another following problem:
The docker container status check can only work if the synology docker package is running.
so i used synopkg with is_onoff, is_active doesn't work and status was giving too much string. so my solution gave this :
#!/bin/bash
while [ "$(synopkg is_onoff Docker)" != "package Docker is turned on" ]; do
sleep 0.1;
done;
echo "Docker package is running..."
echo ""
while [ "$(docker inspect -f {{.State.Running}} gluetun)" = "false" ]; do
sleep 0.1;
done;
echo "gluetun is running..."
echo ""
if [ "$(docker ps -a -f status=exited -f name=qbittorrent --format '{{.Names}}')" ]; then
echo "Qbittorrent is not running I try to start this container"
docker start qbittorrent
else
echo "Qbittorrent docker is already started"
fi
So I was able to do a scheduled task with the root user at Boot-Up in the DSM configurations and it worked fine after a reboot, without checking the Synology Docker package launch status with synopkg it did not work.
NOTE
I think the version of Bash in DSM doesn't like the until loop or it is misinterpreted. Maybe this solution can work with systems where bash is in an older version and for X reasons you can't update it or you don't want to update the binaries of Bash to avoid breaking your system.