Pausing an os.system command over a slurm scheduler if the first command doesn't allocate - scheduled-tasks

I am running a script multiple times over a multinode cluster, and this script processes data sequentially over the cluster. Here is the code:
import os
os.system("srun -p rs2 --mem-per-cpu 200G -t 7-23:00:00 python3 /home/usr/Sim/sim.py aok; srun -p rs2 python3 boguspython.py")
The issue is that if the first statement i.e.
srun -p rs2 --mem-per-cpu 200G -t 7-23:00:00 python3 /home/usr/Sim/sim.py aok
needs to wait for an allocation of resources, then the second statement immediately executes. The second command I have relies on the first command being fully executed. Is there a way to make the second statement wait until the first statement allocates and fully finishes?

I would suggest using subprocess.Popen instead of os.system. This will let you use .wait(), where you can check the status of a command for completion.

Related

Why does docker run exit my terminal session?

I am running Docker Desktop 3.5.1 on MacOS Big Sur and I am totally confused about the following behaviour:
If I run docker run -it --rm postgres psql --help I get the psql usage information (all as expected) and I can continue to run commands in my terminal. Edit to clarify: the docker container exits and terminates as expected, but my zsh session remains active (also as expected).
However, if I run psql with an invalid flag, say, docker run -it --rm postgres psql -m then I get
/usr/lib/postgresql/13/bin/psql: invalid option -- 'm'
Try "psql --help" for more information.
[Process completed]
and my terminal session exits. Edit to clarify: the docker container exits as expected, but it takes the host zsh session with it (unexpected).
What I'm trying to work out is why does my terminal session exit and how can I avoid this happening?
To keep a session open you can execute bash like this:
docker run --rm -it postgres /bin/bash
Then you can run as many psql commands as you like and it wont exit unless bash exits.
edit:
It seems terminal closing behaviour can be configured in OS
https://stackoverflow.com/a/17910412/657477
Very weird behaviour but #ErangaHeshan's comments pointed me to some nonsense inside my .zprofile file. As soon as that was commented out then psql in docker stopped taking down my host zsh session on exit.

Forcing LSF to execute jobs on different hosts

I have a setup consisting from 3 workers and a management node, which I use for submitting tasks. I would like to execute concurrently a setup script at all workers:
bsub -q queue -n 3 -m 'h0 h1 h2' -J "%J_%I" mpirun setup.sh
As far as I understand, I could use 'ptile' resource constraint to force execution at all workers:
bsub -q queue -n 3 -m 'h0 h1 h2' -J "%J_%I" -R 'span[ptile=1]' mpirun setup.sh
However, occasionally I face an issue that my script got executed several times at the same worker.
Is it expected behavior? Or there is a bug in my setup? Is there a better way for enforcing multi worker execution?
Your understanding of span[ptile=1] is correct. LSF will only use 1 core per host for your job. If there aren't enough hosts based on the -n then the job will pend until something frees up.
However, occasionally I face an issue that my script got executed
several times at the same worker.
I suspect that its something with your script. e.g., LSF appends to the stdout file by default. Use -oo to overwrite.

How do avoid a docker container stop after the application is stopped

There is a docker container with Postgres server. Ones postgres is stopped or crashed (doesn't matter) I need to check some environment variables and the state of a few files.
By default, the container stops after an application is finished.
I know there is an option to change the default behavior in dockerfile but I no longer to find it ((
If somebody knows that please give me an Dockerfile example like this :
FROM something
RUN something ...
ENTRYPOINT [something]
You can simply run non exiting process in the end of entrypoint to keep the container alive, even if the main process exits.
For example use
tail -f 'some log file'
There isn't an "option" to keep a container running when the main process has stopped or died. You can run something different in the container while debugging the actual startup scripts. Sometimes you need to override an entrypoint to do this.
docker run -ti $IMAGE /bin/sh
docker run -ti --entrypoint=/bin/sh $IMAGE
If the main process will not stay running when you docker start the existing container then you won't be able to use that container interactively, otherwise you could:
docker start $CID
docker exec -ti $CID sh
For getting files from an existing container, you can docker cp anything you need from the stopped container.
docker cp $CID:/a/path /some/local/path
You can also docker export a tar archive of the complete container.
docker export $CID -o $CID.tar
tar -tvf $CID.tar | grep afile
The environment Docker injects can be seen with docker inspect, but this won't give you anything the process has added to the environment.
docker inspect $CID --format '{{ json .Config.Env }}'
In general, Docker requires a process to keep running in the foreground. Otherwise, it assumes that the application is stopped and the container is shut down. Below, I outline a few ways, that I'm aware of, which can prevent a container from stopping:
Use a process manager such as runit or systemd to run a process inside a container:
As an example, here you can find a Redhat article about running systemd within a docker container.
A few possible approaches for debugging purposes:
a) Add an artificial sleep or pause to the entrypoint:
For example, in bash, you can use this to create an infinite pause:
while true; do sleep 1; done
b) For a fast workaround, one can run the tail command in the container:
As an example, with the command below, we start a new container in detached/background mode (-d) and executing the tail -f /dev/null command inside the container. As a result, this will force the container to run forever.
docker run -d ubuntu:18.04 tail -f /dev/null
And if the main process crashed/exited, you may still look up the ENV variable or check out files with exec and the basic commands like cd, ls. A few relevant commands for that:
docker inspect -f \
'{{range $index, $value := .Config.Env}}{{$value}} {{end}}' name-of-container
docker exec -it name-of-container bash

docker CMD run supervisord in background

is there any way to run supervisord in the background. means start the process and get out of shell.
I have a docker file where i try to run a script that suppose to start the postgresql and then get out. so I have a process running and i can create users.
Docker command
CMD ["/runprocess.sh"]
script runproccess.sh
#!/bin/bash
supervisord -c "/etc/supervisord.conf"
I have also tried to run it in background, but no luck
#!/bin/bash
supervisord -c "/etc/supervisord.conf" &
supervisord starts the process and just stays on screen for ever.
i want it to run the process and get out. so I can run other part of my script.
you can remove setting nodaemon or set it to false in supervisord.conf
[supervisord]
nodaemon=false ; Docker利用ではtrueにする必要あり
this will make supervisor start in background.

Why is sleep needed after fabric call to pg_ctl restart

I'm using Fabric to initialize a postgres server. I have to add a "sleep 1" at the end of the command or the postgres server processes die without explanation or an entry in the log:
sudo('%(pgbin)s/pg_ctl -D %(pgdata)s -l /tmp/pg.log restart && sleep 1' % env, user='postgres')
That is, I see this output on the terminal:
[dbserv] Executing task 'setup_postgres'
[dbserv] run: /bin/bash -l -c "sudo -u postgres /usr/lib/postgresql/9.1/bin/pg_ctl -D /data/pg -l /tmp/pg.log restart && sleep 1"
[dbserv] out: waiting for server to shut down.... done
[dbserv] out: server stopped
[dbserv] out: server starting
Without the && sleep 1, there's nothing in /tmp/pg.log (though the file is created), and no postgres processes are running. With the sleep, everything works fine.
(And if I execute the same command directly on target machine's command line, it works fine without the sleep.)
Since it's working, it doesn't really matter, but I'm asking anyway: Does someone know what the sleep is allowing to happen and why?
You might try also using the pty option set it to false and see if it's related to how fabric handles pseudo-ttys.