Systemd - always have a service running and reboot if service stops more than X times - service

I need to have a systemd service which runs continuously. System under question is an embedded linux built by Yocto.
If the service stops for any reason (either failure or just completed), it should be restarted automatically
If restarted more than X times, system should reboot.
What options are there for having this? I can think of the following two, but both seem suboptimal
1) having a cron job which will literally do the check above and keep the number of retries somewhere in /tmp or other tmpfs
2) having the service itself track the number times it has been started (again in some tmpfs location) and rebooting if necessary. Systemd would just have to continuously try to start the service if it's not running
edit: as suggested by an answer, I modified the service to use the StartLimitAction as given below. It causes the unit to correctly restart, but at no point does it reboot the system, even if I continuously kill the script:
[Unit]
Description=myservice system
[Service]
Type=simple
WorkingDirectory=/home/root
ExecStart=/home/root/start_script.sh
Restart=always
StartLimitAction=reboot
StartLimitIntervalSec=600
StartLimitBurst=5
[Install]
WantedBy=multi-user.target

This in your service file should do something very close to your requirements:
[Service]
Restart=always
[Unit]
StartLimitAction=reboot
StartLimitIntervalSec=60
StartLimitBurst=5
It will restart the service if it stops, except if there are more than 5 restarts in 60 seconds: in that case it will reboot.
You may also want to look at WatchdogSec value, but this software watchdog functionality requires support from the service itself (very easy to add though, see the documentation for WatchDogSec).

My understanding is that the line Restart= should be in [Service], as in the example
but lines StartLimitxxxxx= should be in [Unit].

Related

How can I program an automatic restart of a process in a cgroup of systemctl?

I have an apache2 systemctl service running on my server. It has a bunch of subprocess in grouped in a Cgroup. If the server gets a bigger workload it can happen that one process fails. In particular it is /usr/sbin/fcgi-pm (I am running a Genome Browse (Gbrowse2)).
I want that the process restarts itself if it fails. It works fine, if I restart the Apache2 service again. However, I need to do it manually. Were is the setting of Apache2/ Systemctl that I can order a restart on failure of a subprocess?
Thanks in advance guys.

How to set PIDFile for systemd when main process start multiple child and exit?

Environment: Ubuntu 16.04, daemon programmed in c, using systemd for process management.
So i have the unit file as:
[Unit]
Description=Fantastic Service
After=network.target
[Service]
Restart=always
Type=forking
ExecStart=/opt/fan/tastic
[Install]
WantedBy=multi-user.target
And in my tastic.c code, it basically fork() X number of childs each doing so_reuseport, and than the main process exits leaving the childs to handle requests.
With the above setup it works fine, and i get the expected behavior.
However if i put the PIDFile in the service unit file, i get that the pid provided by my application is non-existent, which it is - since my main process is exiting after starting up the requested number of childs.
Now in the systemd documentation it clearly states that if you do Type=forking you should provide the PIDFile, but the issue is that how am i supposed to provide a single pid file when there are multiple childs and the main parent process exits once the childs start?
Am I missing something?
As you found, the system works fine without PIDFile= in your case. The docs recommend the use of PIDFile=, but I believe that's for the case when there is a single main process, which doesn't apply to your case.
Also see man systemd.kill which explains how processes will be killed. The default is "control-group", which kills "all remaining processes in the control group".
So by default, systemd is going to clean up all those child processes at "stop" time for you, which is what you want.
For someone who did have a main process, they might want to use KillMode=process, and in that case setting PIDFile= may help with that, but this does not apply to your case.

Systemd, how to mount a device at boot, but disable automount after boot

I do not seem to find a simple solution to the following problem:
I have a device listed in fstab, this should get mounted at boot. But if I manually unmount/remove the device after boot and if I present the device later on, systemd sees the device and automatically mounts it.
So how to prevent the latter (like pre-systemd behaviour). I can not use noauto in /etc/fstab since that will disable mounting at boot, which I still want to have.
There are some ways to workaround systemd for this problem. But I would like to see it fixed with using systemd.
After some digging it seems that the fstab systemd generator is creating device units and mount units. The generator seems to add implicit values to this generated device unit, one of them is a "Wants" to the mount unit. Causing a dependency between the mount and the device. How can I influence or override the systemd generators so that it does not create this "Wants" dependency between the device and the mount?
show dev-mapper-test.device |grep -i wants
Wants=mnt-test.mount
But now the tricky part, even if you could override that "wants" then also starting at boot will be disabled...
Thanks
You can write systemd unit with Type=oneshot.
Type=oneshot: this is useful for scripts that do a single job and then exit.
Example:
[Unit]
Description=one_mount
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/bin/mount /dev/partition /path/to/point
ExecStop=/usr/bin/umount /path/to/point
[Install]
WantedBy=multi-user.target

How to set up a systemd service to retry 5 times on a cycle of 30 seconds

I want systemd to start a script and retry a maximum of 5 times, 30s apart.
Reading the systemd.service manual and searching the Internet didn't produce any obvious answers.
To allow a maximum of 5 retries separated by 30 seconds use the following options in the relevant systemd service file.
[Unit]
StartLimitInterval=200
StartLimitBurst=5
[Service]
Restart=always
RestartSec=30
This worked for a service that runs a script using Type=idle. Note that StartLimitInterval must be greater than RestartSec * StartLimitBurst otherwise the service will be restarted indefinitely. The service is considered failed when restarted StartLimitBurst times within StartLimitInterval.
See https://www.freedesktop.org/software/systemd/man/systemd.unit.html#StartLimitIntervalSec=interval and https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec=

systemd service startup issue

This is the first time I've used systemd and a bit unsure about something.
I've got a service that I've set up (for geoserver running under tomcat):
[Unit]
Description=Geoserver
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/local/geoserver/bin/startup-optis.sh
ExecStop=/usr/local/geoserver/bin/shutdown-optis.sh
User=geoserver
[Install]
WantedBy=multi-user.target
The start up script does an exec to run java/tomcat. Starting up the service from the commandline appears to work:
sudo systemctl start geoserver
However the command does not return until I ctrl-c, this doesn't seem right to me. The java process remains running afterwards though and functions normally. I'm reluctant to reboot the box to test this in case this is going to cause problems during init and it's a remote machine and it would be a pain to get someone to address it.
You need to set correct "Type" in "Service" section:
[Service]
...
Type=simple
...
Type
Configures the process start-up type for this service unit. One of simple, forking, oneshot, dbus, notify or idle.
If set to simple (the default if neither Type= nor BusName=, but
ExecStart= are specified), it is expected that the process configured
with ExecStart= is the main process of the service. In this mode, if
the process offers functionality to other processes on the system, its
communication channels should be installed before the daemon is
started up (e.g. sockets set up by systemd, via socket activation), as
systemd will immediately proceed starting follow-up units.
If set to forking, it is expected that the process configured with
ExecStart= will call fork() as part of its start-up. The parent
process is expected to exit when start-up is complete and all
communication channels are set up. The child continues to run as the
main daemon process. This is the behavior of traditional UNIX daemons.
If this setting is used, it is recommended to also use the PIDFile=
option, so that systemd can identify the main process of the daemon.
systemd will proceed with starting follow-up units as soon as the
parent process exits.
Behavior of oneshot is similar to simple; however, it is expected that
the process has to exit before systemd starts follow-up units.
RemainAfterExit= is particularly useful for this type of service. This
is the implied default if neither Type= or ExecStart= are specified.
Behavior of dbus is similar to simple; however, it is expected that
the daemon acquires a name on the D-Bus bus, as configured by
BusName=. systemd will proceed with starting follow-up units after the
D-Bus bus name has been acquired. Service units with this option
configured implicitly gain dependencies on the dbus.socket unit. This
type is the default if BusName= is specified.
Behavior of notify is similar to simple; however, it is expected that
the daemon sends a notification message via sd_notify(3) or an
equivalent call when it has finished starting up. systemd will proceed
with starting follow-up units after this notification message has been
sent. If this option is used, NotifyAccess= (see below) should be set
to open access to the notification socket provided by systemd. If
NotifyAccess= is not set, it will be implicitly set to main. Note that
currently Type=notify will not work if used in combination with
PrivateNetwork=yes.
Behavior of idle is very similar to simple; however, actual execution
of the service binary is delayed until all jobs are dispatched. This
may be used to avoid interleaving of output of shell services with the
status output on the console.