Why does the pytest-threadleak fixture fail for daemon threads? - pytest

I just added a pytest server fixture that spawns a daemon thread to use for tests over a session.
However, this causes the pytest plugin pytest-threadleak to fail. Daemon threads should automatically terminate when the caller finishes.
In the source https://github.com/nirs/pytest-threadleak/blob/master/test_threadleak.py they use a daemon thread as a true positive for thread leaking?

Related

Is it mandatory to run celery as daemon in production

I have seen celery documentation that its advisable to run celery as daemon process. In my case each celery worker is a docker container whose sole purpose is to execute celery tasks. In that scnario also, is it recommended to execute as daemon process?
No, if Celery worker runs inside a container there is no need to run it as daemon.

How to safely restart Airflow and kill a long-running task?

I have Airflow is running in Kubernetes using the CeleryExecutor. Airflow submits and monitors Spark jobs using the DatabricksOperator.
My streaming Spark jobs have a very long runtime (they run forever unless they fail or are cancelled). When pods for Airflow worker are killed while a streaming job is running, the following happens:
Associated task becomes a zombie (running state, but no process with heartbeat)
Task is marked as failed when Airflow reaps zombies
Spark streaming job continues to run
How can I force the worker to kill my Spark job before it shuts down?
I've tried killing the Celery worker with a TERM signal, but apparently that causes Celery to stop accepting new tasks and wait for current tasks to finish (docs).
You need to be more clear about the issue. If you are saying that the spark cluster finishes the jobs as expected and not calling the on_kill function, it's expected behavior. As per the docs on kill function is for cleaning up after task get killed.
def on_kill(self) -> None:
"""
Override this method to cleanup subprocesses when a task instance
gets killed. Any use of the threading, subprocess or multiprocessing
module within an operator needs to be cleaned up or it will leave
ghost processes behind.
"""
In your case when you manually kill the job it is doing what it has to do.
Now if you want to have a clean_up even after successful completion of the job, override post_execute function.
As per the docs. The post execute is
def post_execute(self, context: Any, result: Any = None):
"""
This hook is triggered right after self.execute() is called.
It is passed the execution context and any results returned by the
operator.
"""

Airflow: what do `airflow webserver`, `airflow scheduler` and `airflow worker` exactly do?

I've been working with Airflow for a while now, which was set up by a colleague. Lately I run into several errors, which require me to more in dept know how to fix certain things within Airflow.
I do understand what the 3 processes are, I just don't understand the underlying things that happen when I run them. What exactly happens when I run one of the commands? Can I somewhere see afterwards that they are running? And if I run one of these commands, does this overwrite older webservers/schedulers/workers or add a new one?
Moreover, if I for example run airflow webserver, the screen shows some of the things that are happening. Can I simply get out of this by pressing CTRL + C? Because when I do this, it says things like Worker exiting and Shutting down: Master. Does this mean I'm shutting everything down? How else should I get out of the webserver screen then?
Each process does what they are built to do while they are running (webserver provides a UI, scheduler determines when things need to be run, and workers actually run the tasks).
I think your confusion is that you may be seeing them as commands that tell some sort of "Airflow service" to do something, but they are each standalone commands that start the processes to do stuff. ie. Starting from nothing, you run airflow scheduler: now you have a scheduler running. Run airflow webserver: now you have a webserver running. When you run airflow webserver, it is starting a python flask app. While that process is running, the webserver is running, if you kill command, is goes down.
All three have to be running for airflow as a whole to work (assuming you are using an executor that needs workers). You should only ever had one scheduler running, but if you were to run two processes of airflow webserver (ignoring port conflicts, you would then have two separate http servers running using the same metadata database. Workers are a little different in that you may want multiple worker processes running so you can execute more tasks concurrently. So if you create multiple airflow worker processes, you'll end up with multiple processes taking jobs from the queue, executing them, and updating the task instance with the status of the task.
When you run any of these commands you'll see the stdout and stderr output in console. If you are running them as a daemon or background process, you can check what processes are running on the server.
If you ctrl+c you are sending a signal to kill the process. Ideally for a production airflow cluster, you should have some supervisor monitoring the processes and ensuring that they are always running. Locally you can either run the commands in the foreground of separate shells, minimize them and just keep them running when you need them. Or run them in as a background daemon with the -D argument. ie airflow webserver -D.

systemd `systemctl stop` aggressively kills subprocesses

I've a daemon-like process that starts two subprocesses (and one of the subprocesses starts ~10 others). When I systemctl stop my process the child subprocesses appear to be 'aggressively' killed by systemctl - which doesn't give my process a chance to clean up.
How do I get systemctl stop to quit the aggressive kill and thus to allow my process to orchestrate an orderly clean up?
I tried timeoutSec=30 to no avail.
KillMode= defaults to control-group. That means every process of your service is killed with SIGTERM.
You have two options:
Handle SIGTERM in each of your processes and shutdown within TimeoutStopSec (which defaults to 90 seconds)
If you really want to delegate the shutdown from your main process, set KillMode=mixed. SIGTERM will be sent to the main process only. Then again shutdown within TimeoutStopSec. If you do not shutdown within TimeoutStopSec, systemd will send SIGKILL to all your processes.
Note: I suggest to use KillMode=mixed in option 2 instead of KillMode=process, as the latter would send the final SIGKILL only to your main process, which means your sub-processes would not be killed if they've locked up.
A late (possible) answer, but as I googled for weeks with a similar issue, finding nothing, I figured I add my solution.
My error was that I ran the systemd unit as root and switched (using sudo) to "the correct" user in the startscript (inherited from SysVinit script).
That starts the processes in the user.slice which is killed mercilessly on shutdown. When I changed the unit file to run as the correct user (USER=myuser) and removed sudo from the start script, the processes start in the system.slice and get properly handled on shutdown.

Quartz scheduler shutdown(true) wait for all threads started from running Jobs to stop?

If I have a job and from that job I create some threads, what happens when I call scheduler.shutdown(true)?
Will the scheduler wait for all of my threads to finish or not?
Quartz 1.8.1 API docs:
Halts the Scheduler's firing of Triggers, and cleans up all resources associated with the Scheduler.
Parameters:
waitForJobsToComplete - if true the scheduler will not allow this method to return until all currently executing jobs have completed.
Quarts neither know nor cares about any threads spawned by your job, it will simply wait for the job to complete. If your job spawns new threads then exits, then as far as Quartz is concerned, it's finished.
If your job needs to wait for its spawned threads to complete, then you need to use something like an ExecutorService (see javadoc for java.util.concurrent), which will allow the job thread to wait for its spawned threads to complete. If you're using raw java threads, then use Thread.join().