Teach Zabbix to monitor service status - service

I know that Zabbix can monitor any service on Linux machine via two options:
scan particular tcp or udp port, on which the service is bound
or count the service processes with proc.num[<processname>]
It is totally counter-intuitive, because I can spawn processes with the same executable name and they will deceive Zabbix. I'd prefer to use standard service <servicename> status or systemctl status name.service tool. But there are no standard way to use it from Zabbix except system.run[cmd]
Could you help me to write templates for monitoring a particular service state. We want to use different OSes like Centos 7 and Ubuntu 14.04 and 16.04 distributions. It is pity but service <servicename> status is completely different in listed operating systems.

You can also add the following UserParameters in zabbix_agentd.conf to monitor service status in systemd systems. For non-systemd the OS doesn't really monitor service status, the various bash script "status" arguments are often unreliable.
UserParameter=systemd.unit.is-active[*],systemctl is-active --quiet '$1' && echo 1 || echo 0
UserParameter=systemd.unit.is-failed[*],systemctl is-failed --quiet '$1' && echo 1 || echo 0
UserParameter=systemd.unit.is-enabled[*],systemctl is-enabled --quiet '$1' && echo 1 || echo 0
And then e.g. for sshd status create an item with a key like:
systemd.unit.is-active[sshd]

If Linux services are managed by systemd (Centos 7+, Ubuntu 16+, ...), then you can use https://github.com/cavaliercoder/zabbix-module-systemd. It uses standard systemd D-Bus communication - that's what systemctl does under the hood.

For centos 6 it can be done:
UserParameter=check_service_status_asterisk,sudo service asterisk status 2> /dev/null | grep -q "is running";echo $?
For centos 7 or similar it can be created with:
UserParameter=check_service_status_grafana,systemctl status grafana-server 2> /dev/null |sed -n 3p |grep -q "running";echo $?
or
UserParameter=check_service_status[*],systemctl status $1 2> /dev/null |sed -n 3p |grep -q "running";echo $?

Related

Strace daemon process started using service

Strace can be used for tracing process by passing command for the process as below
strace -f -tt -o strace.log -D <SOME_COMMAND>
But below command fails to trace the syscalls of started daemon process
strace -f -tt -o strace.log -D service nginx start
In this case the strace just traces syscall for /usr/sbin/service and terminates. It does not trace syscalls on nginx process which is started as result of service nginx start
How do a I trace the process started by /usr/sbin/service? Specifically looking for solution with daemon process only!
Instead of running the nginx from service. Run service nginx stop and then run
strace nginx -g "daemon off;"
this will make sure that you get the trace of the process. The -g "daemon off;" will make sure the nginx is not run as a daemon process, else again the strace would end
Service command is just activating a process and if you want to strace it the best is to launch the process directly.
In case you are still interested in debugging the process started using the service command. Then do below
service nginx start
ps aux | grep nginx
Capture the pid from the nginx process and then attach to it using
strace -p <pid>
Forking Processes
To trace processes which fork, you need to use the -f flag
strace -f nginx
Service tracing
When you call service start nginx, assuming the system uses systemd, the call gets translated to systemctl start nginx. Now if you look at the source code of systemd
https://github.com/systemd/systemd/blob/cf45dd36282368d5cdf757cac2cf1fc2b562dab2/src/systemctl/systemctl.c#L3100
r = sd_bus_call_method_async(
bus,
NULL,
"org.freedesktop.systemd1",
"/org/freedesktop/systemd1",
"org.freedesktop.systemd1.Manager",
"Subscribe",
NULL, NULL,
NULL);
It doesn't spawn/fork the process. It sends the message to the systemd service which then starts nginx process.
So in short, NO you can't strace through your service nginx start command.
Change the ExecStart property of the service to include "strace". For example (tested on Debian Buster):
# grep ExecStart /lib/systemd/system/nginx.service
ExecStartPre=/usr/sbin/nginx -t -q -g 'daemon on; master_process on;'
ExecStart=/usr/sbin/nginx -g 'daemon on; master_process on;'
# cd /etc/systemd/system/
# mkdir nginx.service.d
# cat > nginx.service.d/strace.conf <<-EOD
[Service]
ExecStart=
ExecStart=/usr/bin/strace -f -tt -o /tmp/strace.log -D /usr/sbin/nginx -g 'daemon on; master_process on;'
EOD
# systemctl daemon-reload
# systemctl restart nginx.service

How to check if a WildFly Server has started successfully using command/script?

I want to write a script to manage the WildFly start and deploy, but I'm having trouble now. To check if the server has started, I found the command
./jboss-cli.sh -c command=':read-attribute(name=server-state)' | grep running
But when the server is starting, because the controller is not available, ./jboss-cli.sh -c fails to connect and returns an error.
Is there a better way to check whether WildFly started completely?
I found a better solution. The command is
netstat -an | grep 9990 | grep LISTEN
Check the management port (9990) state before the WildFly is ready to accept management commands.
After that, use ./jboss-cli.sh -c command=':read-attribute(name=server-state)' | grep running to check if the server has started. Change the port
if the management port config is not the default 9990.
Here is my start & deploy script, the idea is continually check until the server started.
Then, use the jboss-cli command to deploy my application. And just print the log to the screen, so don't need to use another shell to tail the log file.
#!bin/sh
totalRow=0
printLog(){ #output the new log in the server.log to screen
local newTotal=$(awk 'END{print NR}' ./standalone/log/server.log) #quicker than wc -l
local diff=$(($newTotal-$totalRow))
tail -n $diff ./standalone/log/server.log
totalRow=$newTotal
}
nohup bin/standalone.sh>/dev/null 2>&1 &
echo '======================================== Jboss-eap-7.1 is starting now ========================================'
while true #check if the port is ready
do
sleep 1
if netstat -an | grep 9990 | grep LISTEN
then
printLog
break
fi
printLog
done
while true #check if the server start success
do
if bin/jboss-cli.sh --connect command=':read-attribute(name=server-state)' | grep running
then
printLog
break
fi
printLog
sleep 1
done
echo '======================================== Jboss-eap-7.1 has started!!!!!! ========================================'
bin/jboss-cli.sh --connect command='deploy /bcms/jboss-eap-7.1/war/myApp.war' &
tail -f -n0 ./standalone/log/server.log

Start shrew vpn client (iked & ikec) on start-up of OSMC on Raspberry 2

I would like to connect to a VPN on start-up of OSMC.
Environment:
installed OSMC on Raspberry 2
downloaded, compiled and installed shrew soft vpn on the device
As user 'osmc' with ssh
> sudo iked starts the daemon successfully
> ikec -r "test.vpn" -a starts the client, loads the config and connects successfully
rc.local:
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
sudo iked >> /home/osmc/iked.log 2>> /home/osmc/iked.error.log &
ikec -a -r "test.vpn" >> /home/osmc/ikec.log 2>> /home/osmc/ikec.error.log &
exit 0
after start of raspberry iked is as process visible with ps -e
but ikec is not running
osmc#osmc:~$ /etc/rc.local starts the script and connects to vpn successfully
Problem:
Why does the script not working correctly on start-up?
Thank you for your help!
I was also looking to do the same thing as you and ran into the same problem. I'm no linux expert, but I did figure out a workaround.
I created a script called ikec_after_reboot.sh and it looks like this...
$ cat ikec_after_reboot.sh
#!/bin/bash
echo "Starting ikec"
ikec -r test.vpn -a
I then installed cron.
sudo apt-get update
sudo apt-get install cron
Edit the cron job as root and run the ikec script 60 seconds after reboot.
sudo crontab -e
SHELL=/bin/bash
#reboot sleep 60 && /home/osmc/ikec_after_reboot.sh & >> /home/osmc/ikec.log 2>&1
Now edit your /etc/rc.local file and add the following.
sudo iked >> //home/osmc/iked.log 2>> /home/osmc/iked.error.log &
exit 0
Hopefully, this is helpful to you.

howto: elastic beanstalk + deploy docker + graceful shutdown

Hi great people of stackoverflow,
Were hosting a docker container on EB with an nodejs based code running on it.
When redeploying our docker container we'd like the old one to do a graceful shutdown.
I've found help & guides on how our code could receive a sigterm signal produced by 'docker stop' command.
However further investigation into the EB machine running docker at:
/opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh
shows that when "flipping" from current to the new staged container, the old one is killed with 'docker kill'
Is there any way to change this behaviour to docker stop?
Or in general a recommended approach to handling graceful shutdown of the old container?
Thanks!
Self answering as I've found a solution that works for us:
tl;dr: use .ebextensions scripts to run your script before 01flip, your script will make sure a graceful shutdown of whatevers inside the docker takes place
first,
your app (or whatever your'e running in docker) has to be able to catch a signal, SIGINT for example, and shutdown gracefully upon it.
this is totally unrelated to Docker, you can test it running wherever (locally for example)
There is a lot of info about getting this kind of behaviour done for different kind of apps on the net (be it ruby, node.js etc...)
Second,
your EB/Docker based project can have a .ebextensions folder that holds all kinda of scripts to execute while deploying.
we put 2 custom scripts into it, gracefulshutdown_01.config and gracefulshutdown_02.config file that looks something like this:
# gracefulshutdown_01.config
commands:
backup-original-flip-hook:
command: cp -f /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh /opt/elasticbeanstalk/hooks/appdeploy/01flip.sh.bak
test: '[ ! -f /opt/elasticbeanstalk/hooks/appdeploy/01flip.sh.bak ]'
cleanup-custom-hooks:
command: rm -f 05gracefulshutdown.sh
cwd: /opt/elasticbeanstalk/hooks/appdeploy/enact
ignoreErrors: true
and:
# gracefulshutdown_02.config
commands:
reorder-original-flip-hook:
command: mv /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh /opt/elasticbeanstalk/hooks/appdeploy/enact/10flip.sh
test: '[ -f /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh ]'
files:
"/opt/elasticbeanstalk/hooks/appdeploy/enact/05gracefulshutdown.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/sh
# find currently running docker
EB_CONFIG_DOCKER_CURRENT_APP_FILE=$(/opt/elasticbeanstalk/bin/get-config container -k app_deploy_file)
EB_CONFIG_DOCKER_CURRENT_APP=""
if [ -f $EB_CONFIG_DOCKER_CURRENT_APP_FILE ]; then
EB_CONFIG_DOCKER_CURRENT_APP=`cat $EB_CONFIG_DOCKER_CURRENT_APP_FILE | cut -c 1-12`
echo "Graceful shutdown on app container: $EB_CONFIG_DOCKER_CURRENT_APP"
else
echo "NO CURRENT APP TO GRACEFUL SHUTDOWN FOUND"
exit 0
fi
# give graceful kill command to all running .js files (not stats!!)
docker exec $EB_CONFIG_DOCKER_CURRENT_APP sh -c "ps x -o pid,command | grep -E 'workers' | grep -v -E 'forever|grep' " | awk '{print $1}' | xargs docker exec $EB_CONFIG_DOCKER_CURRENT_APP kill -s SIGINT
echo "sent kill signals"
# wait (max 5 mins) until processes are done and terminate themselves
TRIES=100
until [ $TRIES -eq 0 ]; do
PIDS=`docker exec $EB_CONFIG_DOCKER_CURRENT_APP sh -c "ps x -o pid,command | grep -E 'workers' | grep -v -E 'forever|grep' " | awk '{print $1}' | cat`
echo TRIES $TRIES PIDS $PIDS
if [ -z "$PIDS" ]; then
echo "finished graceful shutdown of docker $EB_CONFIG_DOCKER_CURRENT_APP"
exit 0
else
let TRIES-=1
sleep 3
fi
done
echo "failed to graceful shutdown, please investigate manually"
exit 1
gracefulshutdown_01.config is a small util that backups the original flip01 and deletes (if exists) our custom script.
gracefulshutdown_02.config is where the magic happens.
it creates a 05gracefulshutdown enact script and makes sure flip will happen afterwards by renaming it to 10flip.
05gracefulshutdown, the custom script, does this basically:
find current running docker
find all processes that need to be sent a SIGINT (for us its processes with 'workers' in its name
send a sigint to the above processes
loop:
check if processes from before were killed
continue looping for an amount of tries
if tries are over, exit with status "1" and dont continue to 10flip, manual interference is needed.
this assumes you only have 1 docker running on the machine, and that you are able to manually hop on to check whats wrong in the case it fails (for us never happened yet).
I imagine it can also be improved in many ways, so have fun.

How to get the base package install location on Linux?

I am on Linux Centos OS. I understand that using "rpm -qa" gives a lot of install paths for the corresponding package. However, I need just the base package install location for the package. Is there any way/command/option in Linux to retrieve the same? My code snippet is to retrieve list of running services and the corresponding package installed is as below:-
for i in $(service --status-all | grep -v "not running" | grep -E running\|stopped | awk '{print $1}');
do
packagename=$(rpm -qf /etc/init.d/$i)
servicestatus=$(service --status-all | grep $i | awk '{print $NF}' | sed 's/...//g' | sed 's/.//g');
echo $tdydate, $(ip route get 8.8.8.8 | awk 'NR==1 {print $NF}'), $i, $packagename, $servicestatus > "$HOME/MyLog/running_services.csv"
done
Now, I need to also get the corresponding package install location as well which is hosting the running service. Is there a way to retrieve this as well along with getting the package names. Please confirm.
Thanks in advance for extending help.
Regards.
Okay, with your answer to my question in the comments, which is much clearer to me than you initial question...
Hi, basically what i need is:- I get a list of all installed services on my Centos using service --status-all. Now, for each service, I need to know the corresponding application package location on linux.
...I'll propose this (tested here on CentOS 6.6):
#!/bin/bash
for i in `chkconfig --list | awk '{ print $1}'`; do
service $i status >/dev/null 2>&1
if [ $?==0 ]; then
rpm -qf /etc/init.d/$i
fi
done | sort | uniq
That spits out all rpm names of the services which are currently running.
A bit more detail as to why your current approach is not going to work:
service --status-all is not going to return information which can be parsed reliably. For example, the output on a VM here:
acpid (pid 872) is running...
auditd (pid 789) is running...
Stopped
cgred is stopped
Checking for service cloud-init:Checking for service cloud-init:Checking for service cloud-init:Checking for service cloud-init:crond (pid 1088) is running...
ip6tables: Firewall is not running.
iptables: Firewall is not running.
Kdump is not operational
mdmonitor is stopped
netconsole module not loaded
Configured devices:
lo eth0
Currently active devices:
lo eth0
ntpd (pid 997) is running...
master (pid 1076) is running...
rdisc is stopped
restorecond is stopped
rsyslogd (pid 809) is running...
sandbox is stopped
saslauthd is stopped
openssh-daemon (pid 988) is running...
Some services don't even return their name (third line). Some say stopped, others not running. If you parse the first column of chkconfig --list you know all the service names, which correspond to files in /etc/init.d. Then you can query their status individually and read the return code ($?), which is 0 for running services (or generally for success in the Unix/Linux world), 1 or higher for not running or not installed or incomplete/malfunctioning services.
Armed with names in /etc/init.d/ you can then query the owning package with rpm -qf /etc/init.d/<servicename> and get exactly what I think you were looking for.
Edit: added | sort | uniq after the loop, because some packages contain multiple services, like for example cloud-init, which creates four different services on CentOS. So you sort the list, then make sure you only get distinct (uniq) names back.
Works for me:
acpid-1.0.10-2.1.el6.x86_64
audit-2.3.7-5.el6.x86_64
cloud-init-0.7.5-10.el6.centos.2.x86_64
cronie-1.4.4-12.el6.x86_64
cyrus-sasl-2.1.23-15.el6_6.1.x86_64
initscripts-9.03.46-1.el6.centos.1.x86_64
iptables-1.4.7-14.el6.x86_64
iptables-ipv6-1.4.7-14.el6.x86_64
iputils-20071127-17.el6_4.2.x86_64
kexec-tools-2.0.0-280.el6.x86_64
libcgroup-0.40.rc1-15.el6_6.x86_64
mdadm-3.3-6.el6.x86_64
ntp-4.2.6p5-1.el6.centos.x86_64
ntpdate-4.2.6p5-1.el6.centos.x86_64
openssh-server-5.3p1-104.el6_6.1.x86_64
policycoreutils-2.0.83-19.47.el6_6.1.x86_64
postfix-2.6.6-6.el6_5.x86_64
rsyslog-5.8.10-9.el6_6.x86_64
udev-147-2.57.el6.x86_64
You are looking for --whatprovides instead of -qf (which does formatting).
Tweaking your example...
for i in $(chkconfig --list | awk '{ print $1}'); do service $i status >/dev/null 2>&1; if [ 0==$? ]; then echo -n "$i: "; rpm -q --whatprovides /etc/init.d/$i; fi; done | sort
FYI - this doesn't work on more modern systemd-based systems (CentOS 7).
Example on my Fedora 21 box:
Note: This output shows SysV services only and does not include native
systemd services. SysV configuration data might be overridden by native
systemd configuration.
If you want to list systemd services use 'systemctl list-unit-files'.
To see services enabled on particular target use
'systemctl list-dependencies [target]'.
netconsole: initscripts-9.56.1-5.fc21.x86_64
network: initscripts-9.56.1-5.fc21.x86_64