Gunicorn socket disappears - sockets

Distributor ID: Ubuntu
Description: Ubuntu 12.04.4 LTS
Release: 12.04
Codename: precise
gunicorn (version 19.1.1)
nginx version: nginx/1.1.19
My gunicorn conf:
bind = ["unix:///tmp/someproj1.sock", "unix:///tmp/someproj2.sock"]
pythonpath = "/home/deploy/someproj/someproj"
workers = 5
worker_class = "eventlet"
worker_connections = 25
timeout = 3600
graceful_timeout = 3600
We started getting 502s at around 2PM yesterday in our dev env. This was in the Nginx error log:
connect() to unix:///tmp/someproj1.sock failed (2: No such file or directory) while connecting to upstream"
Both gunicorn sockets were missing from /tmp.
At 11:55AM today I ran ps -eo pid,cmd,etime|grep gunicorn to get the uptime:
4156 gunicorn: master [myproj.    22:53:54
4161 gunicorn: worker [myproj.    22:53:54
4162 gunicorn: worker [myproj.    22:53:54
4163 gunicorn: worker [myproj.    22:53:54
4164 gunicorn: worker [myproj.    22:53:54
4165 gunicorn: worker [myproj.    22:53:53
5207 grep --color=auto gunicorn        00:00
So gunicorn and all its workers have been running uninterrupted since ~1:01PM yesterday. The Nginx access log confirm that requests were successfully being served for about an hour after gunicorn was started. Then it seems for some reason both gunicorn sockets disappeared, and gunicorn continued running without writing any error logs.
Any ideas on what could cause that? Or how to fix it?

It turns out this was indeed a bug, where eventlet workers would delete the socket when they themselves would restart.
The fix has already been merged into master branch, but it has unfortunately not been released yet (version 19.3 still has the problem).

Related

Mosquitto 2.0 config still not working on Raspberry Pi

I'm running an MQTT server mosquitto version 2.0.11 on the same Raspberry Pi Bullseye (3 A+) as both broker and client. I had code working, but understand that one needs to modify a .conf file to get things working. I must still not be understanding something because here's my file:
# I had pid_file /run/mosquitto/mosquitto.pid below, but changed this when docs suggested below should be included if running automatically when device boots, which it will be.
pid_file /var/run/mosquitto/mosquitto.pid
persistence true
persistence_location /var/lib/mosquitto/
log_dest file /var/log/mosquitto/mosquitto.log
include_dir /etc/mosquitto/conf.d
listener 1883
allow_anonymous true
Now when I try to run mosquitto like this:
mosquitto -c /etc/mosquitto/conf.d/mosquitto.conf
I get this error:
1637370455: Loading config file /etc/mosquitto/conf.d/mosquitto.conf
1637370455: Error: Duplicate pid_file value in configuration.
1637370455: Error found at /etc/mosquitto/conf.d/mosquitto.conf:7.
1637370455: Error found at /etc/mosquitto/conf.d/mosquitto.conf:14.
Line 7 is the pid_file /var/run/mosquitto/mosquitto.pid
Line 14 is the include_dir /etc/mosquitto/conf.d
I can make basic pub and sub tests with localhost but still no luck with the hostname. Yes I know you should use security but I have an app that controls a robot over local WiFi and want to preserve app usage without changing that component too.
Any help on getting me back on track to getting the Mosquitto broker & client working on the same pi, allowing anonymous access, and running, is much appreciated. I hav gone through the docs, example file, and consulted other tutorials like Steve’s but proper configuration is still unclear. Thx!
Firstly the errors about not being able to open the pid or log files are because you are running mosquitto as a normal user (probably pi). This user does not have permission to read/write to file in /var/run or /var/log hence the failure when you try and run it "manually".
You've not said how you installed 2.0.11, as the default version bundled with Bullseys is still a 1.5.x build. Assuming you used the mosquitto.org repository then the mosquitto service will have been installed and configured. It will automatically pick up the default config file at /etc/mosquitto/mosquitto.conf as should be displayed with:
$ sudo service mosquitto status
● mosquitto.service - Mosquitto MQTT Broker
Loaded: loaded (/lib/systemd/system/mosquitto.service; enabled; vendor preset
Active: active (running) since Sun 2021-10-31 17:28:52 GMT; 2 weeks 5 days ag
Docs: man:mosquitto.conf(5)
man:mosquitto(8)
Process: 499 ExecStartPre=/bin/mkdir -m 740 -p /var/log/mosquitto (code=exited
Process: 505 ExecStartPre=/bin/chown mosquitto /var/log/mosquitto (code=exited
Process: 507 ExecStartPre=/bin/mkdir -m 740 -p /run/mosquitto (code=exited, st
Process: 510 ExecStartPre=/bin/chown mosquitto /run/mosquitto (code=exited, st
Process: 25679 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCE
Main PID: 511 (mosquitto)
Tasks: 1 (limit: 2181)
CGroup: /system.slice/mosquitto.service
└─511 /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf
Nov 19 00:00:10 www systemd[1]: Reloading Mosquitto MQTT Broker.
Nov 19 00:00:10 www systemd[1]: Reloaded Mosquitto MQTT Broker.
Warning: Journal has been rotated since unit was started. Log output is incomple
The simplest way to enable access from other machines is to do the following:
Reset the default config file to as it was when installed
# Place your local configuration in /etc/mosquitto/conf.d/
#
# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example
pid_file /var/run/mosquitto/mosquitto.pid
persistence true
persistence_location /var/lib/mosquitto/
log_dest file /var/log/mosquitto/mosquitto.log
port 1883
include_dir /etc/mosquitto/conf.d
create a new file in /etc/mosquitto/conf.d e.g. called connect.conf
listener 1883
allow_anonymous true
restart the service with sudo service mosquitto restart

Airflow Webserver Shutting down

my airflow webserver shut down abruptly around the same timing about 16:37 GMT.
My airflow scheduler runs fine (no crash) tasks still run.
There is not much error except.
Handling signal: ttou
Worker exiting (pid: 118711)
ERROR - No response from gunicorn master within 120 seconds
ERROR - Shutting down webserver
Handling signal: term
Worker exiting
Worker exiting
Worker exiting
Worker exiting
Worker exiting
Shutting down: Master
Is it a cause of memory?
My cfg setting for webserver is standard.
# Number of seconds the webserver waits before killing gunicorn master that doesn't respond
web_server_master_timeout = 120
# Number of seconds the gunicorn webserver waits before timing out on a worker
web_server_worker_timeout = 120
# Number of workers to refresh at a time. When set to 0, worker refresh is
# disabled. When nonzero, airflow periodically refreshes webserver workers by
# bringing up new ones and killing old ones.
worker_refresh_batch_size = 1
# Number of seconds to wait before refreshing a batch of workers.
worker_refresh_interval = 30
Update:
Ok its doesn't crash everyday but today I have gunicorn unable to restart log.
ERROR - [0/0] Some workers seem to have died and gunicorn did not restart them as expected
Update: 30 October 2020
[CRITICAL] WORKER TIMEOUT (pid:108237)
I am getting this, I have increased timeout to 240, twice the default value.
Anyone know why this keep arising?

Minikube not starting on Ubuntu, throwing errors

I'm running Ubuntu 17.04 (zesty) on a Dell XPS 13 (3854 MB of RAM and Intel Core i5-5200U CPU # 2.20GHz) and trying to start up Minikube, but I'm getting a couple errors when I try to start it up.
➜ minikube version
minikube version: v0.22.3
➜ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
I have VM VirtualBox Version 5.2.0 r118431 (Qt5.7.1). I've checked the BIOS settings and have virtualization enabled.
➜ minikube start
Starting local Kubernetes v1.7.5 cluster...
Starting VM...
E1025 09:49:40.206594 22972 start.go:146] Error starting host: Error starting stopped host: Unable to start the VM: /usr/bin/VBoxManage startvm minikube --type headless failed:
VBoxManage: error: The virtual machine 'minikube' has terminated unexpectedly during startup with exit code 1 (0x1)
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component MachineWrap, interface IMachine
.
Retrying.
E1025 09:49:40.207051 22972 start.go:152] Error starting host: Error starting stopped host: Unable to start the VM: /usr/bin/VBoxManage startvm minikube --type headless failed:
VBoxManage: error: The virtual machine 'minikube' has terminated unexpectedly during startup with exit code 1 (0x1)
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component MachineWrap, interface IMachine
I've tried some suggests that I've found online, like running ~/rm -rf .minikube/ and trying to start up minikube again. I've tried running minikube stop followed by a minikube delete and then trying to start minikube again. I've tried specifying the virtualbox driver when starting as well minikube start --vm-driver=virtualbox. These aren't working, I still get the same error.
This looks like an issue with your Virtualbox installation, have you tried reinstalling it?
sudo apt-get purge virtualbox virtualbox-dkms
sudo apt-get install virtualbox-5.1
Try to enable virtual box in BIOS system, in my case it resovled problem

Failed to start puppetserver Service

While trying to run a puppet update form a node:
sudo /opt/puppetlabs/bin/puppet agent -t
I get an error:
Error: Could not retrieve catalog; skipping run
Error: Could not send report: Connection refused - connect(2) for "puppet" port 8140`
Elsewhere indicates this is likely a problem with the puppetserver service, and suggests to reboot the server. Restarting didn't help, and when I try to restart the service I get failure:
~$ sudo service puppetserver restart
Job for puppetserver.service failed because the control process exited with error code. See "systemctl status puppetserver.service" and "journalctl -xe" for details.
I've looked at these logs, and as a puppet/linux noob, I'm not sure what to do next.
systemctl status puppetserver.service
● puppetserver.service - puppetserver Service
Loaded: loaded (/lib/systemd/system/puppetserver.service; enabled; vendor preset: enabled)
Active: activating (start-post) since Fri 2016-09-02 15:54:26 PDT; 2s ago
Process: 22301 ExecStartPre=/usr/bin/install --directory --owner=puppet --group=puppet --mode=775 /var/run/puppetlabs/puppetserver (code=exited
Main PID: 22306 (java); : 22307 (bash)
Tasks: 17
Memory: 335.7M
CPU: 5.535s
CGroup: /system.slice/puppetserver.service
├─22306 /usr/bin/java -Xms6g -Xmx6g -XX:MaxPermSize=256m -XX:OnOutOfMemoryError=kill -9 %p -Djava.security.egd=/dev/urandom -cp /opt/p
└─control
├─22307 /bin/bash /opt/puppetlabs/server/apps/puppetserver/ezbake-functions.sh wait_for_app
└─22331 sleep 1
Sep 02 15:54:26 puppet systemd[1]: Starting puppetserver Service...
Sep 02 15:54:26 puppet java[22306]: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
puppet version 4.6.1
The puppet master communicates with the other node using port number 8140.
I don't think a restart will help, since this looks like a connection issue between the server and the node.
please try the following -
first make sure that the puppet master is actually listening on port 8140. run the following command on the puppetmaster -
netstat -ntlp | grep 8140
this command should return something like this -
tcp 0 0 0.0.0.0:8140 0.0.0.0:* LISTEN 1783/puppetmaster
If you don't get the same output, your puppetmaster is not listening, and therefore can not compile catalogs for the node.
Try checking the puppet master log at /var/log/puppetmaster.log
check that the node can communicate with the puppetmaster on the relevant port. you can check this quickly with the telnet command. run this on your node -
telnet < puppetmaster ip address \ dns name> 8140
you should get something like -
Connected to <puppet-master-IP/DNS-name>
Escape character is '^]'.
if you don't get this output, this means that something is blocking you from accessing the puppetmaster. try opening the port in your firewall to access the puppetmaster.
if you're still stuck try using the --debug flag for verbose output and edit your question.
Could be 2 things: (1) in puppet.conf you have configured more memory than you have on your machine. Or (2) You installed both apt-get install puppetserver and apt-get install puppet.
If you get failed to start puppet.service: unit not found. error on slave machine while connecting to puppet.
Close the putty and then again open and connect it.The issue wont come while starting putty on slave.
The error occurs because there is not enough RAM and to fix the error, open the Puppet server configuration file:
sudo nano /etc/sysconfig/puppetserver
And reduce the amount of allocated RAM for the Puppet server (for example, I specified 512m instead of 2g):
JAVA_ARGS="-Xms512m -Xmx512m"
Now let’s start the Puppet server:
sudo systemctl start puppetserver

systemd restart service on watchdog does terminate previous hanged instance

I'm trying to setup systemd service configuration to restart service on watchdog failure. If my application does not call sd_notify() in time, systemd spawns new instance.
However, previus instance is not killed. After some time, I have many instances of my application running.
$ systemctl status my-daemon.service
Loaded: loaded (/lib/systemd/system/my-daemon.service; disabled)
Active: active (running) since Tue, 26 Aug 2014 10:27:46 +0000; 7s ago
Main PID: 1433 (attendance-syst)
CGroup: name=systemd:/system/my-daemon.service
├ 1281 /usr/local/bin/my-daemon
├ 1384 /usr/local/bin/my-daemon
├ 1407 /usr/local/bin/my-daemon
└ 1433 /usr/local/bin/my-daemon
...
This is part of my service file:
[Service]
ExecStart=/usr/local/bin/my-daemon
TimeoutStopSec=5
WatchdogSec=10
Restart=on-failure
How can i configure systemd to kill instances which fails on watchdog?
I have already read manual page but it didn't help me.
I thought Restart=on-failure shall restart hanged process by default...
It's a bug and it's already fixed in newer versions of systemd.
In systemd 208 (available for debian jessie) it works correctly.
In systemd 204 (available for debian wheezy via backports) it's still broken.
I haven't found exact release where they fixed it.