systemd: recover from systemctl hang - service

I have a custom systemd ".service" file, like this one:
[Unit]
Description=MyStuff
After=syslog.target
After=network.target
After=local-fs.target
After=remote-fs.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/MyStuff/start
ExecReload=/bin/kill $MAINPID && /usr/bin/cat /var/run/MyStuff.pid | /usr/bin/xargs --no-run-if-empty /bin/kill
ExecStop=/bin/kill $MAINPID && /usr/bin/cat /var/run/MyStuff.pid | /usr/bin/xargs --no-run-if-empty /bin/kill
ExecStartPost=/usr/local/MyStuff/after_start.sh
PIDFile=/var/run/MyStuff.pid
#RemainAfterExit=yes
TimeoutSec=300
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
It worked fine until minutes ago. A systemctl stop MyStuff hung this way:
[root#Live-1 ffmpeg]# systemctl status WowzaStreamingEngine
● MyStuff.service - MyStuff
Loaded: loaded (/usr/lib/systemd/system/MyStuff.service; enabled; vendor preset: disabled)
Active: deactivating (final-sigkill) (Result: timeout) since vie 2017-07-28 11:12:23 -03; 23min ago
Main PID: 25149
CGroup: /system.slice/MyStuff.service
jul 28 11:12:23 Live-1 kill[10382]: kill: cannot find process "/usr/bin/cat"
jul 28 11:12:23 Live-1 kill[10382]: kill: cannot find process "/var/run/MyStuff.pid"
jul 28 11:12:23 Live-1 kill[10382]: kill: cannot find process "|"
jul 28 11:12:23 Live-1 kill[10382]: kill: cannot find process "/usr/bin/xargs"
jul 28 11:12:23 Live-1 kill[10382]: kill: cannot find process "--no-run-if-empty"
jul 28 11:12:23 Live-1 kill[10382]: kill: cannot find process "/bin/kill"
jul 28 11:12:23 Live-1 systemd[1]: MyStuff.service: control process exited, code=exited status=1
jul 28 11:17:24 Live-1 systemd[1]: MyStuff.service stop-sigterm timed out. Killing.
jul 28 11:22:24 Live-1 systemd[1]: MyStuff.service still around after SIGKILL. Ignoring.
jul 28 11:31:22 Live-1 systemd[1]: MyStuff.service stop-final-sigterm timed out. Killing.
I know there are errors reported there. That is not the problem; I'm not asking about those errors. The problem is: systemd/systemctl hangs in that state, and i'm unable to do start/stop/restart actions because all of them stays waiting forever.
What i need to know is a way around that state. This time I just rebooted the server, because it was development environment. But i can't just deal this way on production.
So: when systemd/systemctl hangs this way, what can I do in order to recover my service from this state without rebooting the server?
Thanks in advance.

Related

systemd service not starting on boot, starts when i restart it

I have made this service file to start a python script when my raspberry pi (4) boots up:
/etc/systemd/system/plants.service
[Unit]
Description=plant-sender
After=network.target
[Service]
Type=simple
User=root
Group=root
WorkingDirectory=/home/theo/Repos/plants-monitor/remote
ExecStart=/usr/bin/python main.py
Restart=on-failure
[Install]
WantedBy=multi-user.target
However, once the pi is on, I run sudo systemctl status plants, and get:
* plants.service - plant-sender
Loaded: loaded (/etc/systemd/system/plants.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2020-03-30 20:22:43 EDT; 1min 45s ago
Process: 323 ExecStart=/usr/bin/python main.py (code=exited, status=1/FAILURE)
Main PID: 323 (code=exited, status=1/FAILURE)
Mar 30 20:22:43 arpi systemd[1]: plants.service: Scheduled restart job, restart counter is at 5.
Mar 30 20:22:43 arpi systemd[1]: Stopped plant-sender.
Mar 30 20:22:43 arpi systemd[1]: plants.service: Start request repeated too quickly.
Mar 30 20:22:43 arpi systemd[1]: plants.service: Failed with result 'exit-code'.
Mar 30 20:22:43 arpi systemd[1]: Failed to start plant-sender.
But, after running sudo systemctl restart plants, the service starts up and everything is fine.
If it doesn't start on boot but does on systemctl restart, I'd be looking at whether /home/theo/Repos/plants-monitor/remote is mounted at that point.
There may be something automounting or home-mounting your home directory when you log in.
If so, you could change the working directory to something that exists always, even if only a test.
Additionally, using journalctl -n 9999 -u plants will get you more log messages, so you can see why it's failing, rather than just seeing the "tried too many times, giving up" messages.

Cannot start Zookeeper service on CentOS7

When trying to start zookeeper service I get the following
● zookeeper.service
Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2020-04-02 16:19:24 EDT; 5min ago
Process: 5201 ExecStop=/usr/local/kafka/kafka_2.13-2.4.1/bin/zookeeper-server-stop.sh (code=exited, status=1/FAILURE)
Process: 4882 ExecStart=/usr/local/kafka/kafka_2.13-2.4.1/bin/zookeeper-server-start.sh /usr/local/kafka/kafka_2.13-2.4.1/config/zookeeper.properties (code=exited, status=127)
Main PID: 4882 (code=exited, status=127)
Apr 02 16:19:24 centos.localdomain systemd[1]: Started zookeeper.service.
Apr 02 16:19:24 centos.localdomain systemd[1]: zookeeper.service: main process exited, code=exited, status=127/n/a
Apr 02 16:19:24 centos.localdomain systemd[1]: zookeeper.service: control process exited, code=exited status=1
Apr 02 16:19:24 centos.localdomain systemd[1]: Unit zookeeper.service entered failed state.
Apr 02 16:19:24 centos.localdomain systemd[1]: zookeeper.service failed.
The zookeeper.service file is configured as follows
[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
User=specadmin
ExecStart=/usr/local/kafka/kafka_2.13-2.4.1/bin/zookeeper-server-start.sh /usr/local/kafka/kafka_2.13-2.4.1/config/zookeeper.properties
ExecStop=/usr/local/kafka/kafka_2.13-2.4.1/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
When trying to run zookeeper manually with the same user configured in the service file everything works fine.
Please advise
Turns out the issue was related to the environment variables systemd uses.
Systemd uses a fixed $PATH variable and the changes that are made to the /etc/profile /etc/bashrc and the like are not applied to systemd.
Zookeeper runs java which needs to be part of the search path, but since systemd doesn't use the files where the search path is set, zookeeper start script couldn't find java.
I solved it by overriding the search path by adding Environment=PATH=... parameter in the zookeeper service file and adding all the required directories.

Error to create a systemd service, with live socket

I'm trying to create a systemd service on CentOS 7.5, to acces livestatos from remote thru
File proxy-to-livestatus.service:
[Unit]
Requires=naemon.service
After=naemon.service
[Service]
ExecStart=/usr/lib/systemd/systemd-socket-proxyd /run/naemon/live
File proxy-to-livestatus.socket:
[Unit]
StopWhenUnneeded=true
[Socket]
ListenStream=6557
Status:
systemctl status proxy-to-livestatus.service
● proxy-to-livestatus.service
Loaded: loaded (/etc/systemd/system/proxy-to-livestatus.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since mié 2018-07-18 09:11:58 CEST; 15s ago
Process: 3203 ExecStart=/usr/lib/systemd/systemd-socket-proxyd /run/naemon/live (code=exited, status=1/FAILURE)
Main PID: 3203 (code=exited, status=1/FAILURE)
jul 18 09:11:58 chuwi systemd[1]: Started proxy-to-livestatus.service.
jul 18 09:11:58 chuwi systemd[1]: Starting proxy-to-livestatus.service...
jul 18 09:11:58 chuwi systemd-socket-proxyd[3203]: Didn't get any sockets passed in.
jul 18 09:11:58 chuwi systemd[1]: proxy-to-livestatus.service: main process exited, code=exited, status=1/FAILURE
jul 18 09:11:58 chuwi systemd[1]: Unit proxy-to-livestatus.service entered failed state.
jul 18 09:11:58 chuwi systemd[1]: proxy-to-livestatus.service failed.
Hi to resolve this issue, we haver to enable the socket with --now option
systemctl enable --now proxy-to-livestatus.socket
and the start the proxy-to-livestatus.service
systemctl start systemctl enable --now proxy-to-livestatus.socket
Regards

Celery daemonization: celery.service: Failed at step USER spawning /home/mike/movingcollage/movingcollageenv/bin/celery: No such process

When I do journalctl -f after systemctl start celery.service I get
Mar 21 19:14:21 ubuntu-2gb-nyc3-01 systemd[1]: Reloading.
Mar 21 19:14:21 ubuntu-2gb-nyc3-01 systemd[1]: Started ACPI event daemon.
Mar 21 19:14:25 ubuntu-2gb-nyc3-01 systemd[21431]: celery.service: Failed at step USER spawning /home/mike/movingcollage/movingcollageenv/bin/celery: No such process
Mar 21 19:14:25 ubuntu-2gb-nyc3-01 systemd[1]: Starting celery service...
Mar 21 19:14:25 ubuntu-2gb-nyc3-01 systemd[1]: celery.service: Control process exited, code=exited status=217
Mar 21 19:14:25 ubuntu-2gb-nyc3-01 systemd[1]: Failed to start celery service.
Mar 21 19:14:25 ubuntu-2gb-nyc3-01 systemd[1]: celery.service: Unit entered failed state.
Mar 21 19:14:25 ubuntu-2gb-nyc3-01 systemd[1]: celery.service: Failed with result 'exit-code'.
This is my celery.service configuration:
[Unit]
Description=celery service
After=network.target
[Service]
#PIDFile=/run/celery/pid
Type=forking
User=celery
Group=celery
#RuntimeDirectory=celery
WorkingDirectory=/home/mike/movingcollage
ExecStart=/home/mike/movingcollage/movingcollageenv/bin/celery multi start 3 -A movingcollage "-c 5 -Q celery -l INFO"
ExecReload=/home/mike/movingcollage/movingcollageenv/bin/celery multi restart 3
ExecStop=/home/mike/movingcollage/movingcollageenv/bin/celery multi stopwait 3
[Install]
WantedBy=multi-user.target
Does anyone know what is wrong? Thanks in advance
For celery multi I think it is better to use Type=oneshot. Celery can start many workers processes and each will have its own PID.
I start my celery like this:
celery multi start 2\
-A my_app_name\
--uid=1001 --gid=1001\
-f /var/log/celery/celery.log\
--loglevel="INFO"\
--pidfile:1=/run/celery1.pid\
--pidfile:2=/run/celery2.pid
Of course in your case uid, gid and all paths will be different.
You need to change:
User=celery
Group=celery
to your user and group, in my case:
User=ubuntu
Group=ubuntu

Filebeat Service will not start on RHEL 7

I have a trouble/problem with my Filebeat installation.
When I try it to start with "service filebeat start", it says "Starting Filebeat". After "service filebeat status" I get 4 PIDs (until here everything looks "normal"):
[root#(Server) run]# service filebeat status
Filebeat is running with pid: 30650 30657 30658 30659
But after checking the PID, we see that it is not running:
[root#(Server) run]# ps -ef | grep 30650
root 30665 31360 0 16:27 pts/0 00:00:00 grep --color=auto 30650
Trying to start it with systemctl doesn't help:
[root#(Server) run]# systemctl start filebeat
Job for filebeat.service failed because a configured resource limit was exceeded. See "systemctl status filebeat.service" and "journalctl -xe" for details.
Status says:
[root#Server run]# systemctl status filebeat
● filebeat.service - LSB: start and stop filebeat
Loaded: loaded (/etc/rc.d/init.d/filebeat; bad; vendor preset: disabled)
Active: failed (Result: resources) since Tue 2017-09-26 16:30:33 CEST; 1min 41s ago
Docs: man:systemd-sysv-generator(8)
Process: 32118 ExecStart=/etc/rc.d/init.d/filebeat start (code=exited, status=0/SUCCESS)
Sep 26 16:30:33 Server... systemd[1]: Starting LSB: start and stop filebeat...
Sep 26 16:30:33 Server... filebeat[32118]: Starting Filebeat
Sep 26 16:30:33 Server... su[32119]: (to user) root on none
Sep 26 16:30:33 Server... systemd[1]: PID file /var/run/filebeat.pid not readable (yet?) after start.
Sep 26 16:30:33 Server... systemd[1]: Failed to start LSB: start and stop filebeat.
Sep 26 16:30:33 Server... systemd[1]: Unit filebeat.service entered failed state.
Sep 26 16:30:33 Server... systemd[1]: filebeat.service failed.
Does somebody has any idea?
Regards
Problem was "chown permissions". I installed filebeat not as root and the "data" directory had root user & group ownership. After changing that, it runs and starts automatically after boot.
Regards