Supervisor kills Prefect agent with SIGTERM unexpectedly

Supervisor kills Prefect agent with SIGTERM unexpectedly - supervisord

I'm using a rapsberry pi 4, v10(buster).
I installed supervisor per the instructions here: http://supervisord.org/installing.html
Except I changed "pip" to "pip3" because I want to monitor running things that use the python3 kernel.
I'm using Prefect, and the supervisord.conf is running the program with command=/home/pi/.local/bin/prefect "agent local start" (I tried this with and without double quotes)
Looking at the supervisord.log file it seems like the Prefect Agent does start, I see the ASCII art that normally shows up when I start it from the command line. But then it shows it was terminated by SIGTERM;not expected, WARN recieved SIGTERM inidicating exit request.
I saw this post: Supervisor gets a SIGTERM for some reason, quits and stops all its processes but I don't even have that 10Periodic file it references.
Anyone know why/how Supervisor processes are getting killed by sigterm?

It could be that your process exits immediately because you don’t have an API key in your command and this is required to connect your agent to the Prefect Cloud API. Additionally, it’s a best practice to always assign a unique label to your agents, below is an example with “raspberry” as a label.
You can also check the logs/status:
supervisorctl status
Here is a command you can try, plus you can specify a directory in your supervisor config (not sure whether environment variables are needed but I saw it from other raspberry Pi supervisor user):
[program:prefect-agent]
command=prefect agent local start -l raspberry -k YOUR_API_KEY --no-hostname-label
directory=/home/pi/.local/bin/prefect
user=pi
environment=HOME="/home/pi/.local/bin/prefect",USER="pi"

Related

.NET Core / Kubernetes - SIGTERM, clean shutdown

I'm trying to verify that shutdown is completing cleanly on Kubernetes, with a .NET Core 2.0 app.
I have an app which can run in two "modes" - one using ASP.NET Core and one as a kind of worker process. Both use Console and JSON-which-ends-up-in-Elasticsearch-via-Filebeat-sidecar-container logger output which indicate startup and shutdown progress.
Additionally, I have console output which writes directly to stdout when a SIGTERM or Ctrl-C is received and shutdown begins.
Locally, the app works flawlessly - I get the direct console output, then the logger output flowing to stdout on Ctrl+C (on Windows).
My experiment scenario:
App deployed to GCS k8s cluster (using helm, though I imagine that doesn't make a difference)
Using kubectl logs -f to stream logs from the specific container
Killing the pod from GCS cloud console site, or deleting the resources via helm delete
Dockerfile is FROM microsoft/dotnet:2.1-aspnetcore-runtime and has ENTRYPOINT ["dotnet", "MyAppHere.dll"], so not wrapped in a bash process or anything
Not specifying a terminationGracePeriodSeconds so guess it defaults to 30 sec
Observing output returned
Results:
The API pod log streaming showed just the immediate console output, "[SIGTERM] Stop signal received", not the other Console logger output about shutdown process
The worker pod log streaming showed a little more - the same console output and some Console logger output about shutdown process
The JSON logs didn't seem to pick any of the shutdown log output
My conclusions:
I don't know if Kubernetes is allowing the process to complete before terminating it, or just issuing SIGTERM then killing things very quick. I think it should be waiting, but then, why no complete console logger output?
I don't know if console output is cut off when stdout log streaming at some point before processes finally terminates?
I would guess that the JSON stuff doesn't come through to ES because filebeat running in the sidecar terminates even if there's outstanding stuff in files to send
I would like to know:
Can anyone advise on points 1,2 above?
Any ideas for a way to allow a little extra time or leeway for the sidecar to send stuff up, like a pod container termination order, delay on shutdown for that container, etc?

SIGTERM does indeed signal termination. The less obvious part is that when the SIGTERM handler returns, everything is considered finished.
The fix is to not return from the SIGTERM handler until the app has finished shutting down. For example, using a ManualResetEvent and Wait()ing it in the handler.

I've started to look into this for my own purposes and have come across your question over a year after it was posted... This is a bit late, but have you tried GraceTerm?
There is an associated NuGET package for this.
From the description...
Graceterm middleware provides implementation to ensure graceful shutdown of AspNet Core applications. The basic concept is: After application received a SIGTERM (a signal asking it to terminate), Graceterm will hold it alive till all pending requests are completed or a timeout occur.
I haven't personally tried this yet, but it does look promising.

Try add STOPSIGNAL SIGINT to your Dockerfile

Airflow: what do `airflow webserver`, `airflow scheduler` and `airflow worker` exactly do?

I've been working with Airflow for a while now, which was set up by a colleague. Lately I run into several errors, which require me to more in dept know how to fix certain things within Airflow.
I do understand what the 3 processes are, I just don't understand the underlying things that happen when I run them. What exactly happens when I run one of the commands? Can I somewhere see afterwards that they are running? And if I run one of these commands, does this overwrite older webservers/schedulers/workers or add a new one?
Moreover, if I for example run airflow webserver, the screen shows some of the things that are happening. Can I simply get out of this by pressing CTRL + C? Because when I do this, it says things like Worker exiting and Shutting down: Master. Does this mean I'm shutting everything down? How else should I get out of the webserver screen then?

Each process does what they are built to do while they are running (webserver provides a UI, scheduler determines when things need to be run, and workers actually run the tasks).
I think your confusion is that you may be seeing them as commands that tell some sort of "Airflow service" to do something, but they are each standalone commands that start the processes to do stuff. ie. Starting from nothing, you run airflow scheduler: now you have a scheduler running. Run airflow webserver: now you have a webserver running. When you run airflow webserver, it is starting a python flask app. While that process is running, the webserver is running, if you kill command, is goes down.
All three have to be running for airflow as a whole to work (assuming you are using an executor that needs workers). You should only ever had one scheduler running, but if you were to run two processes of airflow webserver (ignoring port conflicts, you would then have two separate http servers running using the same metadata database. Workers are a little different in that you may want multiple worker processes running so you can execute more tasks concurrently. So if you create multiple airflow worker processes, you'll end up with multiple processes taking jobs from the queue, executing them, and updating the task instance with the status of the task.
When you run any of these commands you'll see the stdout and stderr output in console. If you are running them as a daemon or background process, you can check what processes are running on the server.
If you ctrl+c you are sending a signal to kill the process. Ideally for a production airflow cluster, you should have some supervisor monitoring the processes and ensuring that they are always running. Locally you can either run the commands in the foreground of separate shells, minimize them and just keep them running when you need them. Or run them in as a background daemon with the -D argument. ie airflow webserver -D.

cf stop command does not perform graceful shutdown on bluemix

I have a node app in bluemix which holds some transaction cache in memory and I would like to flush this cache to DB before the application goes down. So I have the appropriate event handlers to intercept SIGTERM/SIGINT signals and all works fine from my laptop, however, it seems like the cf stop command does not perform graceful shutdown.
Unfortunately, there is no clear documentation around this topic, at one place in the cloudfoundary app-lifecycle doc they do mention that first SIGTERM is issued and then wait for 10 secs etc but Im not seeing this happening. Probably a bug on their side. https://docs.cloudfoundry.org/devguide/deploy-apps/app-lifecycle.html
Has anyone noticed this issue and probably have a workaround pls?

CF is sending the SIGTERM first but because of how the app is started by other processes, it's not being correctly propagated to your app.
As a workaround, disable App Management by setting the CF environment variable BLUEMIX_APP_MGMT_INSTALL=false and prefix your app's start command in your package.json file with 'exec' (e.g. exec node app.js).

systemd `systemctl stop` aggressively kills subprocesses

I've a daemon-like process that starts two subprocesses (and one of the subprocesses starts ~10 others). When I systemctl stop my process the child subprocesses appear to be 'aggressively' killed by systemctl - which doesn't give my process a chance to clean up.
How do I get systemctl stop to quit the aggressive kill and thus to allow my process to orchestrate an orderly clean up?
I tried timeoutSec=30 to no avail.

KillMode= defaults to control-group. That means every process of your service is killed with SIGTERM.
You have two options:
Handle SIGTERM in each of your processes and shutdown within TimeoutStopSec (which defaults to 90 seconds)
If you really want to delegate the shutdown from your main process, set KillMode=mixed. SIGTERM will be sent to the main process only. Then again shutdown within TimeoutStopSec. If you do not shutdown within TimeoutStopSec, systemd will send SIGKILL to all your processes.
Note: I suggest to use KillMode=mixed in option 2 instead of KillMode=process, as the latter would send the final SIGKILL only to your main process, which means your sub-processes would not be killed if they've locked up.

A late (possible) answer, but as I googled for weeks with a similar issue, finding nothing, I figured I add my solution.
My error was that I ran the systemd unit as root and switched (using sudo) to "the correct" user in the startscript (inherited from SysVinit script).
That starts the processes in the user.slice which is killed mercilessly on shutdown. When I changed the unit file to run as the correct user (USER=myuser) and removed sudo from the start script, the processes start in the system.slice and get properly handled on shutdown.

UpStart initctl start|restart ubuntu

When using upstart on ubuntu how do I issue a command for starting a job if not running and restarting if already running. When deploying an app to a new node the job is not defined.
initctl restart JOB complains if not already running
initctl start JOB complains if already running.
I can script it to do
initctl start JOB
initctl restart JOB
But it doesn't seem to be the nicest thing to do.

I was in front of the same problem.
Short of a straight "lazy-stop-then-start" command built-in initctl, we have to script.
Invoke start and restart if it fails:
initctl start JOB || initctl restart JOB
This script is probably not the answer both of us were looking for but it is short enough to mention it.
As long as the service works nicely, it will do the trick.
When the services fails, this script fails twice; For example, if the service was stopped and actually fails to start, it will also fail to restart.
Definitely looking for an improvement to this.
I hope this helps.

I also tried the 'start or restart' method that hmalphettes suggested, but got into troubles. When using this approach then updates to the upstart script would not be applied. Instead I use this, which works as I would expect:
sudo stop JOB || true && sudo start JOB
This basically reads 'Stop the job if it's running, then start it.'

sudo service JOB restart
The service command was patched in Ubuntu to make it work the same on Upstart as it does in the most common cases on sysvinit.
systemctl restart JOB
Has some unexpected effects, and in general should be carefully studied before using. It is mostly there so you can restart a job without re-loading the job definition, which is a really uncommon case.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse