systemd restart service on watchdog does terminate previous hanged instance - daemon

I'm trying to setup systemd service configuration to restart service on watchdog failure. If my application does not call sd_notify() in time, systemd spawns new instance.
However, previus instance is not killed. After some time, I have many instances of my application running.
$ systemctl status my-daemon.service
Loaded: loaded (/lib/systemd/system/my-daemon.service; disabled)
Active: active (running) since Tue, 26 Aug 2014 10:27:46 +0000; 7s ago
Main PID: 1433 (attendance-syst)
CGroup: name=systemd:/system/my-daemon.service
├ 1281 /usr/local/bin/my-daemon
├ 1384 /usr/local/bin/my-daemon
├ 1407 /usr/local/bin/my-daemon
└ 1433 /usr/local/bin/my-daemon
...
This is part of my service file:
[Service]
ExecStart=/usr/local/bin/my-daemon
TimeoutStopSec=5
WatchdogSec=10
Restart=on-failure
How can i configure systemd to kill instances which fails on watchdog?
I have already read manual page but it didn't help me.
I thought Restart=on-failure shall restart hanged process by default...

It's a bug and it's already fixed in newer versions of systemd.
In systemd 208 (available for debian jessie) it works correctly.
In systemd 204 (available for debian wheezy via backports) it's still broken.
I haven't found exact release where they fixed it.

Related

Mosquitto 2.0 config still not working on Raspberry Pi

I'm running an MQTT server mosquitto version 2.0.11 on the same Raspberry Pi Bullseye (3 A+) as both broker and client. I had code working, but understand that one needs to modify a .conf file to get things working. I must still not be understanding something because here's my file:
# I had pid_file /run/mosquitto/mosquitto.pid below, but changed this when docs suggested below should be included if running automatically when device boots, which it will be.
pid_file /var/run/mosquitto/mosquitto.pid
persistence true
persistence_location /var/lib/mosquitto/
log_dest file /var/log/mosquitto/mosquitto.log
include_dir /etc/mosquitto/conf.d
listener 1883
allow_anonymous true
Now when I try to run mosquitto like this:
mosquitto -c /etc/mosquitto/conf.d/mosquitto.conf
I get this error:
1637370455: Loading config file /etc/mosquitto/conf.d/mosquitto.conf
1637370455: Error: Duplicate pid_file value in configuration.
1637370455: Error found at /etc/mosquitto/conf.d/mosquitto.conf:7.
1637370455: Error found at /etc/mosquitto/conf.d/mosquitto.conf:14.
Line 7 is the pid_file /var/run/mosquitto/mosquitto.pid
Line 14 is the include_dir /etc/mosquitto/conf.d
I can make basic pub and sub tests with localhost but still no luck with the hostname. Yes I know you should use security but I have an app that controls a robot over local WiFi and want to preserve app usage without changing that component too.
Any help on getting me back on track to getting the Mosquitto broker & client working on the same pi, allowing anonymous access, and running, is much appreciated. I hav gone through the docs, example file, and consulted other tutorials like Steve’s but proper configuration is still unclear. Thx!
Firstly the errors about not being able to open the pid or log files are because you are running mosquitto as a normal user (probably pi). This user does not have permission to read/write to file in /var/run or /var/log hence the failure when you try and run it "manually".
You've not said how you installed 2.0.11, as the default version bundled with Bullseys is still a 1.5.x build. Assuming you used the mosquitto.org repository then the mosquitto service will have been installed and configured. It will automatically pick up the default config file at /etc/mosquitto/mosquitto.conf as should be displayed with:
$ sudo service mosquitto status
● mosquitto.service - Mosquitto MQTT Broker
Loaded: loaded (/lib/systemd/system/mosquitto.service; enabled; vendor preset
Active: active (running) since Sun 2021-10-31 17:28:52 GMT; 2 weeks 5 days ag
Docs: man:mosquitto.conf(5)
man:mosquitto(8)
Process: 499 ExecStartPre=/bin/mkdir -m 740 -p /var/log/mosquitto (code=exited
Process: 505 ExecStartPre=/bin/chown mosquitto /var/log/mosquitto (code=exited
Process: 507 ExecStartPre=/bin/mkdir -m 740 -p /run/mosquitto (code=exited, st
Process: 510 ExecStartPre=/bin/chown mosquitto /run/mosquitto (code=exited, st
Process: 25679 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCE
Main PID: 511 (mosquitto)
Tasks: 1 (limit: 2181)
CGroup: /system.slice/mosquitto.service
└─511 /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf
Nov 19 00:00:10 www systemd[1]: Reloading Mosquitto MQTT Broker.
Nov 19 00:00:10 www systemd[1]: Reloaded Mosquitto MQTT Broker.
Warning: Journal has been rotated since unit was started. Log output is incomple
The simplest way to enable access from other machines is to do the following:
Reset the default config file to as it was when installed
# Place your local configuration in /etc/mosquitto/conf.d/
#
# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example
pid_file /var/run/mosquitto/mosquitto.pid
persistence true
persistence_location /var/lib/mosquitto/
log_dest file /var/log/mosquitto/mosquitto.log
port 1883
include_dir /etc/mosquitto/conf.d
create a new file in /etc/mosquitto/conf.d e.g. called connect.conf
listener 1883
allow_anonymous true
restart the service with sudo service mosquitto restart

systemd service activation for Python script fails

I want to register a python script as a daemon service, executed at system startup and running continuously in the background. The script opens network sockets, a local log file and executes a number of threads. The script is well-formed and runs without any compilation or runtime issues.
I used below service file for registration:
[Unit]
Description=ModBus2KNX Gateway Daemon
After=multi-user.target
[Service]
Type=simple
ExecStart=/usr/bin/python3 /usr/bin/ModBusDaemon.py
[Install]
WantedBy=multi-user.target
Starting the service results in below error:
● ModBusDaemon.service - ModBus2KNX Gateway Daemon
Loaded: loaded (/lib/systemd/system/ModBusDaemon.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2021-01-04 21:46:29 CET; 6min ago
Process: 1390 ExecStart=/usr/bin/python3 /usr/bin/ModBusDaemon.py (code=exited, status=1/FAILURE)
Main PID: 1390 (code=exited, status=1/FAILURE)
Jan 04 21:46:29 raspberrypi systemd[1]: Started ModBus2KNX Gateway Daemon.
Jan 04 21:46:29 raspberrypi systemd[1]: ModBusDaemon.service: Main process exited, code=exited, status=1/FAILURE
Jan 04 21:46:29 raspberrypi systemd[1]: ModBusDaemon.service: Failed with result 'exit-code'.
Appreciate your support!
Related posts brought me to the resolution for my issue. Ubuntu systemd custom service failing with python script refers to the same issue. The proposed solution adding the WorkingDirectory to the Service section resolved the issue for me. Though, I could not find the adequate systemd documentation outlining on the implicit dependency.
As MBizm saim you must also add WorkingDirectory.
And After that you must also run these commands:
sudo systemctl daemon-reload
sudo systemctl enable your_service.service
sudo systemctl start your_service.service

supervisord not starting automatically on Ubuntu 20.04

I intalled latest superviosrd
pip install supervisord==4.2.1
and then I copied supervisord init script to /etc/init.d/supervisord
scp /path/to/init_script.sh /etc/init.d/supervisord
sudo update-rc.d supervisord defaults
and then I run:
sudo service supervisord start && sudo service supervisord status
got:
● supervisord.service - LSB: Starts supervisord - see http://supervisord.org
Loaded: loaded (/etc/init.d/supervisord; generated)
Active: active (exited) since Sat 2020-08-29 13:03:57 UTC; 19min ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 1074)
Memory: 0B
CGroup: /system.slice/supervisord.service
Aug 29 13:03:57 vagrant systemd[1]: Starting LSB: Starts supervisord - see http://supervisord.org...
Aug 29 13:03:57 vagrant systemd[1]: Started LSB: Starts supervisord - see http://supervisord.org.
and then I check
supervisorctl
got
unix:///tmp/supervisor.sock no such file
supervisor>
However, if I manually run
supervisord
it will start:
redis STARTING
supervisor>
Any ideas on how I can make it run automatically on start? and why sudo service supervisord start does not start it?
To answer my own question, it is because the init script I was using has the following:
NAME=supervisord
DAEMON=/usr/bin/$NAME
SUPERVISORCTL=/usr/bin/supervisorctl
however, supervisord is installed in /usr/local/bin/, so I changed the path, and everything works now.

PG::ConnectionBad Postgres Cluster down

Digitalocean disabled my droplet's internet access. After fixing the error (rollback to older backup) they restored the internet access. But afterwards I constantly get an error when deploying, I can't seem to get my Postgres database up and running.
I'm getting an error each time I try to deploy my application.
PG::ConnectionBad: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
So I used SSH to login to my server and check if my Postgres was actually running with:
pg_lsclusters
Results into:
Ver Cluster Port Status Owner Data directory Log file
9.5 main 5432 down postgres /var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log
Postgres server status
So my Postgres server seems to be down. I tried putting it 'up' again with:
pg_ctlcluster 9.5 main start After doing so I got the error: Insecure directory in $ENV{PATH} while running with -T switch at /usr/bin/pg_ctlcluster line 403.
And /usr/bin/pg_ctlcluster on line 403 says:
system 'systemctl', 'is-active', '-q', "postgresql\#$version-$cluster";
But I'm not to sure what the problem could be here and how I could fix this.
Update
I also tried updating the permissions on /bin to 755 as mentioned here. Sadly that did not fix my problem.
Update 2
I changed the /usr/bin to 755. Now when I try pg_ctlcluster 9.5 main start, I get this:
Job for postgresql#9.5-main.service failed because the control process exited with error code. See "systemctl status postgresql#9.5-main.service" and "journalctl -xe" for details.
And inside the systemctl status postgresql#9.5-main.service:
postgresql#9.5-main.service - PostgreSQL Cluster 9.5-main
Loaded: loaded (/lib/systemd/system/postgresql#.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2018-01-28 17:32:38 EST; 45s ago
Process: 22473 ExecStart=postgresql#%i --skip-systemctl-redirect %i start (code=exited, status=1/FAILURE)
Jan 28 17:32:08 *url* systemd[1]: Starting PostgreSQL Cluster 9.5-main...
Jan 28 17:32:38 *url* postgresql#9.5-main[22473]: The PostgreSQL server failed to start.
Jan 28 17:32:38 *url* systemd[1]: postgresql#9.5-main.service: Control process exited, code=exited status=1
Jan 28 17:32:38 *url* systemd[1]: Failed to start PostgreSQL Cluster 9.5-main.
Jan 28 17:32:38 *url* systemd[1]: postgresql#9.5-main.service: Unit entered failed state.
Jan 28 17:32:38 *url* systemd[1]: postgresql#9.5-main.service: Failed with result 'exit-code'.
Thanks!
You better not mix systemctl and pg_ctlcluster. Let systemctl makes the calls to pg_ctlcluster with the right user and permissions. You should start your postgresql instance with
sudo systemctl start postgresql#9.5-main.service
Also, check the errors in the startup log. You can post them too, to help you figure out what's going on.
Your systemctl status also outputs that the service is disable, so, when the server reboots, you will have to start the service manually. To enable it run:
sudo systemctl enable postgresql#9.5-main.service
I hope it helps
It is mainly because /etc/hosts file is somehow changed.I have removed extra space inside /etc/hosts file.Use cat /etc/hosts
Add these lines into the file
127.0.0.1 localhost
127.0.1.1 your-host-name
::1 ip6-localhost ip6-loopback
And I have given permission 644 to /etc/hosts file.It is working for me even after the reboot of the system.

Failed to start puppetserver Service

While trying to run a puppet update form a node:
sudo /opt/puppetlabs/bin/puppet agent -t
I get an error:
Error: Could not retrieve catalog; skipping run
Error: Could not send report: Connection refused - connect(2) for "puppet" port 8140`
Elsewhere indicates this is likely a problem with the puppetserver service, and suggests to reboot the server. Restarting didn't help, and when I try to restart the service I get failure:
~$ sudo service puppetserver restart
Job for puppetserver.service failed because the control process exited with error code. See "systemctl status puppetserver.service" and "journalctl -xe" for details.
I've looked at these logs, and as a puppet/linux noob, I'm not sure what to do next.
systemctl status puppetserver.service
● puppetserver.service - puppetserver Service
Loaded: loaded (/lib/systemd/system/puppetserver.service; enabled; vendor preset: enabled)
Active: activating (start-post) since Fri 2016-09-02 15:54:26 PDT; 2s ago
Process: 22301 ExecStartPre=/usr/bin/install --directory --owner=puppet --group=puppet --mode=775 /var/run/puppetlabs/puppetserver (code=exited
Main PID: 22306 (java); : 22307 (bash)
Tasks: 17
Memory: 335.7M
CPU: 5.535s
CGroup: /system.slice/puppetserver.service
├─22306 /usr/bin/java -Xms6g -Xmx6g -XX:MaxPermSize=256m -XX:OnOutOfMemoryError=kill -9 %p -Djava.security.egd=/dev/urandom -cp /opt/p
└─control
├─22307 /bin/bash /opt/puppetlabs/server/apps/puppetserver/ezbake-functions.sh wait_for_app
└─22331 sleep 1
Sep 02 15:54:26 puppet systemd[1]: Starting puppetserver Service...
Sep 02 15:54:26 puppet java[22306]: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
puppet version 4.6.1
The puppet master communicates with the other node using port number 8140.
I don't think a restart will help, since this looks like a connection issue between the server and the node.
please try the following -
first make sure that the puppet master is actually listening on port 8140. run the following command on the puppetmaster -
netstat -ntlp | grep 8140
this command should return something like this -
tcp 0 0 0.0.0.0:8140 0.0.0.0:* LISTEN 1783/puppetmaster
If you don't get the same output, your puppetmaster is not listening, and therefore can not compile catalogs for the node.
Try checking the puppet master log at /var/log/puppetmaster.log
check that the node can communicate with the puppetmaster on the relevant port. you can check this quickly with the telnet command. run this on your node -
telnet < puppetmaster ip address \ dns name> 8140
you should get something like -
Connected to <puppet-master-IP/DNS-name>
Escape character is '^]'.
if you don't get this output, this means that something is blocking you from accessing the puppetmaster. try opening the port in your firewall to access the puppetmaster.
if you're still stuck try using the --debug flag for verbose output and edit your question.
Could be 2 things: (1) in puppet.conf you have configured more memory than you have on your machine. Or (2) You installed both apt-get install puppetserver and apt-get install puppet.
If you get failed to start puppet.service: unit not found. error on slave machine while connecting to puppet.
Close the putty and then again open and connect it.The issue wont come while starting putty on slave.
The error occurs because there is not enough RAM and to fix the error, open the Puppet server configuration file:
sudo nano /etc/sysconfig/puppetserver
And reduce the amount of allocated RAM for the Puppet server (for example, I specified 512m instead of 2g):
JAVA_ARGS="-Xms512m -Xmx512m"
Now let’s start the Puppet server:
sudo systemctl start puppetserver