Managing ZFS scrub via systemd - ubuntu-16.04

I would like to automate zpool scrubbing on my ZFS server. I have written the following systemd service as /etc/systemd/system/zfs-scrub.service)
[Unit]
Description=Scrub ZFS tank pool
[Service]
Type=simple
ExecStart=/bin/sh -c 'zpool scrub `zpool list -H -o name`'
RemainAfterExit=yes
ExecStop=/bin/sh -c 'zpool scrub -s `zpool list -H -o name`'
As well as a timer (as /etc/system.d/system/zfs-scrub.timer) :
[Unit]
Description=Run zpool scrub every 1st Saturday
[Timer]
OnCalendar=Sat *-*-* 22:00:00
After having started it a few weeks back, I checked to see if it behaved. It seems that systemd still thinks the service is running, so the timer didn't run.
It seems there is no ExecStatus, so systemd doesn't know that the service completed.
Am I missing something ? Should I instead write a script that starts the scrub, greps the zpool status line and catch signals to stop the scrub when systemd signals it ?
Is it possible to write a OnCalendar line that means "once per month, only on weekends" ?

To answer your second question, the following can be read as running the first seven days of each month at a minute past midnight, but only if that day is also a Sunday. Thus, it should run the first Sunday of every month.
OnCalendar=Sun *-*-01..07 00:01:00
Regarding your first question, it would be helpful to see what the journal says, rather than just confirming it ran via the zpool status command. I notice that your service unit does not require the zfs.target. Also I would make the service Type a "oneshot" rather than "simple".
EDIT: This should work for you, though you need to start/enable for each pool:
Try this for your zfs-scrub#.service file:
[Unit]
Description=Scrub ZFS tank pool
[Service]
Type=oneshot
ExecStart=/bin/zpool scrub %i
Then this for your zfs-scrub#.timer file:
[Unit]
Description=Run zpool scrub every 1st Saturday
[Timer]
OnCalendar=Sun *-*-01..07 00:01:00
You would then start the service via the timer with:
systemctl start zfs-scrub#[pool name].timer
systemctl enable zfs-scrub#[pool name].timer

Related

Systemd timer for a bound service restarts the service it is bound to

I have two systemd services a and b, where b is "After" and "BindsTo" a, and b is a short command that is launched every minute with a systemd timer.
Heres my config:
$ cat /systemd/a.service
[Unit]
After=foo
BindsTo=foo
[Service]
ExecStart=/opt/a/bin/a
Group=lev
User=lev
Restart=Always
WorkingDirectory=/opt/a
$ cat /systemd/b.service
[Unit]
After=a
BindsTo=a
[Service]
ExecStart=/opt/b/bin/b
Group=lev
User=lev
WorkingDirectory=/opt/b
$ cat /systemd/b.timer
[Unit]
[Timer]
OnCalendar=*:0/1:00
When I run sudo systemctl stop a, service a is indeed stopped, but then it is started back up at the top of the next minute when the timer for service b runs b
The systemd documentation states that BindsTo
declares that if the unit bound to is stopped, this unit will be stopped too.
(https://www.freedesktop.org/software/systemd/man/systemd.unit.html#BindsTo=)
I expect that by stopping a, b will also be stopped, and the timer disabled. This is not the case. Can you help explain why the b timer restarts not only b (which should fail), but also a?
Can you also help me edit these services such that:
on boot, a is started first, then b is started
when I sudo systemctl stop a, b's timer does not run
when I sudo systemctl start a, b's timer begins running again
Thanks in advance!
Here are the simplest units that meet your constraints:
test-a.service
[Service]
ExecStart=sleep 3600 # long-running command
test-b.service
[Service]
ExecStart=date # short command
test-b.timer
[Unit]
After=test-a.service
BindsTo=test-a.service # makes test-b.timer stop when test-a.service stops
[Timer]
OnCalendar=* *-*-* *:*:00
[Install]
WantedBy=test-a.service # makes test-b.timer start when test-a.service starts
Don't forget to
systemctl daemon-reload
systemctl disable test-b.timer
systemctl enable test-b.timer
To apply the changes in the [Install] section.
Explanations:
what you want is to bind a.service with b.timer, not b.service
b.service is only a short command, and systemctl start b.service will only run the command, not start the associated timer
only systemctl start b.timer will start the timer
The WantedBy tells systemd to start test-b.timer when test-a.service starts
The BindsTo tells test-b.timer to stop when test-a.service stops
The After only ensures that test-b.timer is not started at the same time than test-a.service: it will force systemd to start test-b.timer after test-a.service has finished starting.
About the behaviour you observed:
When you stopped your a.service, the b.timer was still active and it tried starting b.service to run its short command. Since your b.service specified BindsTo=a.service, systemd thought that b.service required a.service to be started also, and effectively restarted a.service for b.service to run correctly.
I could be mistaken, but I believe that the "Restart=Always" option is the reason that the service named a is started and hence why the service named b is not subsequently stopped.
The man page for systemd.service states if this option is set to always
the service will be restarted regardless of whether it exited cleanly
or not, got terminated abnormally by a signal, or hit a timeout.
https://www.freedesktop.org/software/systemd/man/systemd.service.html#Restart=
So even though you are stopping the service, this option is starting it again.
You can test this by running the following commands. Since you have service "b" on a one minute timer, I would run the stop command at 10 seconds after the top of the minute (i.e. 10:00:10). Then I would run the status command 20 seconds later and see if the service has been restarted.
sudo systemctl stop a
sudo systemctl status a b

systemd not executing ExecStop script

I have created a service(name- develop) using systemd. Following is the content of my develop unit file -
Description=Develop Manager Service
[Service]
Type=forking
PIDFile = /home/nayasa/data/var/run/developPid
User=root
Group=root
ExecStartPre = /bin/bash /home/nayasa/control_scripts/develop_startPre.sh
ExecStart =/bin/bash /home/nayasa/control_scripts/develop_start.sh
ExecStop =/bin/bash /home/nayasa/control_scripts/develop_stop.sh
[Install]
WantedBy=multi-user.target
My develop.service forks multiple processes during runtime.
Whenever I run systemctl stop develop.service , systemd stops all processs in the CGroup of my develop service whereas the develop_stop script that I have provided only kills the main process using pid from pidfile. I want to stop only the main process. It seems to me that systemd is not using my stop script. How do I force systemd to execute my stop script to stop the service and not kill all processes of the Cgroup? FYI- I know that using KillMode option I can direct systemd to kill only main process and leave other processes, but I want to know why is my script not being executed?
It's a little weird to expect orphaned processes to persist after stopping a service. You would be left with a system that's in an unknown state. What would happen if you started the service again?
I think what you probably want is more complicated than a single service.
Let's say you wanted develop.service to launch proc1 and proc2. You want systemctl stop develop.service to kill proc1 but not proc2. In this case, you still need something to manage proc2 otherwise you have a rogue orphaned unmanaged and monitored process. The answer is to use another service.
Instead, try making two services. develop.service would launch proc1, possibly using your scripts. Then add a Wants=proc2.service to your [Unit] section. proc2.service would be responsible for proc2.
This means systemctl start develop.service will launch proc1 and proc2. Meanwhile systemctl stop develop.service will only kill proc1. proc2 can still be stopped/monitored by inspecting proc2.service.

Create systemd unit in cloud-init and enable it

I've created the following systemd unit in the cloud-init file:
- path: /etc/systemd/system/multi-user.target.wants/docker-compose.service
owner: root:root
permissions: '0755'
content: |
[Unit]
Description=Docker Compose Boot Up
Requires=docker.service
After=docker.service
[Service]
Type=simple
ExecStart=/usr/local/bin/docker-compose -f /opt/data/docker-compose.yml up -d
Restart=always
RestartSec=30
[Install]
WantedBy=multi-user.target
When I try to run
sudo systemctl enable docker-compose.service
to create the symlink I get this:
Failed to execute operation: No such file or directory
However I'm sure that the file is under /etc/systemd/system/multi-user.target.wants
I had the same need, but I was working from a recipe that said to create /etc/systemd/system/unit.service and then do systemctl enable --now unit.
So I created the unit file with write_files and did the reload and enable in a text/x-shellscript part and that worked fine. (User scripts run last and in order, while I don't think there are guarantees about when the write_files key in the user-data is processed. I found out the hard way that it's before the user key so you can't set ownership to users that cloud-init creates).
I think runcmd entries are converted to user scripts and run in list order (either before or after the other user scripts), so if you don't like x-shellscript parts you can do the reload and enable that way. /var/log/cloud-init.log is where I check the order, there is probably a config file too.
Full disclosure: I forgot the systemctl daemon-reload command but it still worked. Actually there is a caveat against systemd manipulations from cloud-init because it's running under systemd itself and some systemd commands may wait for cloud-init to finish -- deadlock!
After unit file creation but before any manipulations with it systemd should be notified about the changes:
systemctl daemon-reload
So cloud-init YAML block creating docker-compose.service file should be followed by:
runcmd:
- systemctl daemon-reload
Check that every file involved is present and valid:
ls -l /etc/systemd/system/multi-user.target.wants/docker-compose.service
ls -l /usr/local/bin/docker-compose
ls -l /opt/data/docker-compose.yml
systemd-analyze verify /etc/systemd/system/multi-user.target.wants/docker-compose.service
Also consider the timing. Even if the files exist once they are fully booted, would /etc/systemd/system/multi-user.target.wants/ exist when cloud-init runs?

slurm systemd wait nfs mount all folder

i use slurm and i want that my deamon slurmd in systemd wait that my nfs mount.
this is my slurmd.service :
[Unit]
Description=Slurm node daemon
After=network.target nfs-client.target nfs-client.service
ConditionPathExists=/etc/slurm/slurm.conf
[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/slurmd
ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurmd.pid
KillMode=process
LimitNOFILE=51200
LimitMEMLOCK=infinity
LimitSTACK=infinity
[Install]
WantedBy=multi-user.target
I want that my service runs when my nfs is fully assembled. My nfs is located in /nfs
Because my network is slow and I have big nfs, I need to wait 1 minute for my nfs to be fully assembled.
because I need that slurm to write files in /nfs/slurm folder
actually, when centos start and slurmd deamon start, I have this error "/nfs/slurm no such file or folder"
I try to use PathExist parameter but not work and TimeoutStartSec but not work, my deamon run and I have this error.
Thanks in advance for your help.
Systemd has RequiresMountsFor.
You can add the following line to the [Unit] section of slurmd.service.
RequiresMountsFor=/nfs
Keep in mind, if you are using the resume and suspend features that your ResumeTimeout must be greater than the resume time + nfs mount time.
Write a script that will check if the nfs mount is available, else sleep for X seconds. Then use this script in ExecStartPre= of systemd service file.
$ cat /usr/local/bin/checkNFSMount
#!/bin/bash
if [ -d /nfs/slurm ]; then
exit 0;
else
sleep 20;
exit 0;
fi
In the systemd service file:
ExecStartPre=/usr/local/bin/checkNFSMount
ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS
Read about ExecStartPre: here.

Docker and systemd - service stopping after 10 seconds

I'm having trouble getting a Docker container to stay up when it's started by systemd. When I start it manually with sudo docker start containername, it stays up without trouble, but when it's started via systemd with sudo systemctl start containername, it stays up for 10 seconds then mysteriously dies, leaving messages in syslog something like the following:
Mar 13 14:01:09 hostname docker[329]: time="2015-03-13T14:01:09Z" level="info" msg="POST /v1.17/containers/containername/stop?t=10"
Mar 13 14:01:09 hostname docker[329]: time="2015-03-13T14:01:09Z" level="info" msg="+job stop(containername)"
I am making the assumption that it's systemd killing the process, but I can't work out why it might be happening. The systemd unit file (/etc/systemd/system/containername.service) is pretty simple, as follows:
[Unit]
Description=MyContainer
After=docker.service
Requires=docker.service
[Service]
ExecStart=/usr/bin/docker start containername
ExecStop=/usr/bin/docker stop containername
[Install]
WantedBy=multi-user.target
Docker starts fine on boot, and it looks like it does even start the docker container, but no matter if on boot or manually, it then quits after exactly 10 seconds. Help gratefully received!
Solution: The start command seems to need the -a (attach) parameter as described in the documentation when used in a systemd script. I assume this is because it by default forks to the background, although the systemd expect daemon feature doesn't appear to fix the issue.
from the docker-start manpage:
-a, --attach=true|false
Attach container's STDOUT and STDERR and forward all signals to the process. The default is false.
The whole systemd script then becomes:
[Unit]
Description=MyContainer
After=docker.service
Requires=docker.service
[Service]
ExecStart=/usr/bin/docker start -a containername
ExecStop=/usr/bin/docker stop containername
[Install]
WantedBy=multi-user.target