Snakemake: Singularity parameters --home and --bind set by default but disallowed on HPC - hpc

I have already posted this as an issue on Github at https://github.com/snakemake/snakemake/issues/279 but haven't got any response yet. I hope to find help here.
Version
I am using the following versions on our HPC cluster:
Snakemake c5.4.4
singularity version 3.5.3
Minimal example
singularity: "docker://bash"
rule test:
shell: "echo test"
Describe the bug
snakemake --use-singularity --debug
returns this message:
Building DAG of jobs...
Pulling singularity image docker://bash.
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 test
1
[Fri Mar 13 15:59:30 2020]
rule test:
jobid: 0
Activating singularity image /data/nanopore/test/.snakemake/singularity/36b22e49e8a03fd08160e9345dd1034e.simg
FATAL: container creation failed: not mounting user requested home: user bind control is disallowed
[Fri Mar 13 15:59:30 2020]
Error in rule test:
jobid: 0
RuleException:
CalledProcessError in line 4 of /data/nanopore/test/Snakefile:
Command ' singularity exec --home /data/nanopore/test --bind /opt/snakemake/v5.4.4/lib/python3.5/site-packages/snakemake-5.4.4-py3.5.egg:/mnt/snakemake /data/nanopore/test/.snakemake/singularity/36b22e49e8a03fd08160e9345dd1034e.simg bash -c 'set -euo pipefail; echo test'' returned non-zero exit status 255
File "/data/nanopore/test/Snakefile", line 4, in __rule_test
File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /data/nanopore/test/.snakemake/log/2020-03-13T155917.601627.snakemake.log
Apparently, snakemake runs singularity with default values for --home and --bind. These were disallowed by the administrator, however.
Executing
singularity exec --home /data/nanopore/test --bind /opt/snakemake/v5.4.4/lib/python3.5/site-packages/snakemake-5.4.4-py3.5.egg:/mnt/snakemake /data/nanopore/test/.snakemake/singularity/36b22e49e8a03fd08160e9345dd1034e.simg bash -c 'set -euo pipefail;'
returns:
FATAL: container creation failed: not mounting user requested home: user bind control is disallowed
Additional context
Is there a way to disable the Singularity default parameter setting in snakemake? Inside the singularity container the /data directory is still writeable and readable anyway.
Thanks a lot

Related

GitLab K8s Runner fails for get_sources

we are trying to move gitlab-runners from standard CentOS VMs to kebernetes.
But after setup and registration, pipeline fails with unknown error:
Running with gitlab-runner 15.7.0 (259d2fd4)
on Kubernetes-local JXRw3mH1
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab-runner
Using Kubernetes executor with image gitlab-test.domain:5005/image:latest ...
Using attach strategy to execute scripts...
Preparing environment
00:04
Waiting for pod gitlab-runner/runner-jxrw3mh1-project-290-concurrent-0dpd88 to be running, status is Pending
Running on runner-jxrw3mh1-project-290-concurrent-0dpd88 via gitlab-runner-d7df6c548-hsgxg...
Getting source from Git repository
00:00
error: could not lock config file /root/.gitconfig: Read-only file system
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: command terminated with exit code 1
Inside the log of the job pod we found:
helper Running on runner-jxrw3mh1-project-290-concurrent-0dpd88 via gitlab-runner-d7df6c548-hsgxg...
helper
helper {"command_exit_code": 0, "script": "/scripts-290-207166/prepare_script"}
helper error: could not lock config file /root/.gitconfig: Read-only file system
helper
helper {"command_exit_code": 1, "script": "/scripts-290-207166/get_sources"}
helper
helper {"command_exit_code": 0, "script": "/scripts-290-207166/cleanup_file_variables"}
Inside the log of the gitlab-runner pod we found:
Starting in container "helper" the command ["gitlab-runner-build" "<<<" "/scripts-290-207167/get_sources" "2>&1 | tee -a /logs-290-207167/output.log"] with script: #!/usr/bin/env bash
if set -o | grep pipefail > /dev/null; then set -o pipefail; fi; set -o errexit
set +o noclobber
: | eval $'export FF_CMD_DISABLE_DELAYED_ERROR_LEVEL_EXPANSION=$\'false\'\nexport FF_NETWORK_PER_BUILD=$\'false\'\nexport FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=$\'false\'\nexport FF_USE_DIRECT_DOWNLOAD
exit 0
job=207167 project=290 runner=JXRw3mH1
Remote process exited with the status: CommandExitCode: 1, Script: /scripts-290-207167/get_sources job=207167 project=290 runner=JXRw3mH1
Container "helper" exited with error: command terminated with exit code 1 job=207167 project=290 runner=JXRw3mH1
notes:
the error "error: could not lock config file /root/.gitconfig: Read-only file system" is due to the current user inside container is different by root
the file /logs-290-207167/output.log contains the log of the job pod
Inside job pod shell we also tested some git commands and perform successfully fetch and clone using our personal credentials (the same user that perform the run of the pipeline from gitlab gui).
We think the problem can be related on gitlab-ci-token, but we have finished our investigation... :frowning:

my pipeline for auto devops in gitlab not working property

this is my test
i run the job with autodevops runner
i run this without .gitlab-ci.yaml file. but i receive the error like this.
$ if [[ -z "$CI_COMMIT_TAG" ]]; then # collapsed multi-line command
$ /build/build.sh
Building Heroku-based application using gliderlabs/herokuish docker image...
Attempting to pull a previously built image for use with --cache-from...
invalid reference format
invalid reference format
No previously cached image found. The docker build will proceed without using a cached image
invalid argument "/master:fa1708343b13496937aa567a1aecdc184f43d197" for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
Cleaning up file based variables
30:00
ERROR: Job failed: command terminated with exit code 1

MissingOutputException in snakemake

I'm trying to run a peak calling tool within a conda environment using snakemake.
The script looks as such (I only added the rows connect to the problem):
rule all:
input:
expand('{project}/{organism}/{mapper}/seacr/{pattern}.auc.threshold.bed', pattern = PATTERN, sample = IDS, organism = config['org'], project = config['project'], mapper = config['mapper']) # SEACR - run the peak calling
rule seacr_run:
input:
IP = '{project}/{organism}/{mapper}/seacr/IP_{PATTERN}.bedgraph',
IgG = '{project}/{organism}/{mapper}/seacr/IgG_{PATTERN}.bedgraph',
output:
bed1 = '{project}/{organism}/{mapper}/seacr/{PATTERN}.auc.threshold.bed',
shell:
'''
bash /fs/home/yeroslaviz/SEACR/SEACR_1.3.sh {input.IP} 0.01 non stringent {output.bed1}
'''
When running the -nps dryrun of the snamemake command I get the correct command printed to STDOUT
> snakemake -nps /fs/pool/pool-bcfngs/scripts/P193.ChipSeq.Snakemake -j 100
...
Building DAG of jobs...
Job counts:
count jobs
1 all
1 seacr_run
2
[Tue Mar 3 13:56:19 2020]
rule seacr_run:
input: P193/Mmu.GrCm38/bowtie2/seacr/IP_H3K4m3.bedgraph, P193/Mmu.GrCm38/bowtie2/seacr/IgG_H3K4m3.bedgraph
output: P193/Mmu.GrCm38/bowtie2/seacr/H3K4m3.auc.threshold.bed
jobid: 22
wildcards: project=P193, organism=Mmu.GrCm38, mapper=bowtie2, PATTERN=H3K4m3
bash /fs/home/yeroslaviz/SEACR/SEACR_1.3.sh P193/Mmu.GrCm38/bowtie2/seacr/IP_H3K4m3.bedgraph 0.01 non stringent P193/Mmu.GrCm38/bowtie2/seacr/H3K4m3.auc.threshold.bed
[Tue Mar 3 13:56:19 2020]
localrule all:
...
Job counts:
count jobs
1 all
1 seacr_run
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
When running the command above in the command line the tool works without problems. But hwhen I try to run it within the snakemake workflow I get the following error:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 67 of /fs/pool/pool-bcfngs/scripts/P193.ChipSeq.Snakemake:
Missing files after 5 seconds:
P193/Mmu.GrCm38/bowtie2/seacr/H3K4m3.auc.threshold.bed
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Can anyone explain what is happening?
Thanks

Creating linked_dirs in Capistrano 3 fails

I am attempting to set up Capistrano with a SilverStripe build and am running into a few troubles setting up the shared directories.
I set the linked_dirs in deploy.rb with the following:
set :linked_dirs, %w{assets vendor}
Since adding this line I get the following error:
[617afa7f] Command: /usr/bin/env mkdir -p /var/www/website/releases/20160215083713 /var/www/website/releases/20160215083713
INFO [617afa7f] Finished in 0.250 seconds with exit status 0 (successful).
DEBUG [88c3de20] Running /usr/bin/env [ -L /var/www/website/releases/20160215083713/assets ] as capistrano#128.199.231.152
DEBUG [88c3de20] Command: [ -L /var/www/website/releases/20160215083713/assets ]
DEBUG [88c3de20] Finished in 0.258 seconds with exit status 1 (failed).
DEBUG [3d61c1c4] Running /usr/bin/env [ -d /var/www/website/releases/20160215083713/assets ] as capistrano#128.199.231.152
DEBUG [3d61c1c4] Command: [ -d /var/www/website/releases/20160215083713/assets ]
DEBUG [3d61c1c4] Finished in 0.254 seconds with exit status 1 (failed).
INFO [3016a8cd] Running /usr/bin/env ln -s /var/www/website/shared/assets /var/www/website/releases/20160215083713/assets as capistrano#128.199.231.152
I am a mega noob when it comes to Capistrano and a semi noob when it comes to server configuration and permissions, so any pointers would be appreciated.
It probably hasn't actually failed. One thing to know about Capistrano is that (success) and (failed) are actually returning the result of the exit status, (success) if 0 and (failed) if non-0.
If we look at the command in question, it says that /usr/bin/env [ -L /var/www/website/releases/20160215083713/assets ] failed. This command is saying "return 0 if /var/www/website/releases/20160215083713/assets exists and is a link (-L). This fails, but that just means it returns non-0, thus the link needs to be created. Note that the next command also fails (-d) with asserting that the path is a directory. And the last line in your output is actually creating the link in question.
You can see the test in the Capistrano codebase here: https://github.com/capistrano/capistrano/blob/master/lib/capistrano/tasks/deploy.rake#L128
You can clean up and simplify the output with https://github.com/mattbrictson/airbrussh. This is developed by one of the primary Capistrano devs.
As a sidenote, similarly all the green text in your terminal is stdout and the red text is stderr. This can also be confusing.

Hudson failing build w/o revealing cause

Every build has failed as of Tuesday. I'm not exactly sure what happened. The Phing targets (clean/prepare) are being executed properly. Additionally, the unit tests are passing with flying colors, with only a warning for duplicate code (not a reason for a fail). I tried removing the phpDoc target to see if that was causing the error, but the build still failed.
Started by user chris Updating
file://localhost/projects/svn/ips-com/trunk
At revision 234 no change for
file://localhost/projects/svn/ips-com/trunk
since the previous build [trunk] $
/opt/phing/bin/phing clean prepare
-logger phing.listener.NoBannerLogger Buildfile:
/var/lib/hudson/.hudson/jobs/IPS/workspace/trunk/build.xml
IPS > clean:
[echo] Clean... [delete] Deleting directory
/var/lib/hudson/.hudson/jobs/IPS/workspace/build
IPS > prepare:
[echo] Prepare...
[mkdir] Created dir: /var/lib/hudson/.hudson/jobs/IPS/workspace/build
[mkdir] Created dir: /var/lib/hudson/.hudson/jobs/IPS/workspace/build/logs
[mkdir] Created dir: /var/lib/hudson/.hudson/jobs/IPS/workspace/build/logs/coverage
[mkdir] Created dir: /var/lib/hudson/.hudson/jobs/IPS/workspace/build/logs/coverage-html
[mkdir] Created dir: /var/lib/hudson/.hudson/jobs/IPS/workspace/build/docs
[mkdir] Created dir: /var/lib/hudson/.hudson/jobs/IPS/workspace/build/app
BUILD FINISHED
Total time: 1.0244 second
[workspace] $ /bin/bash -xe
/tmp/hudson3259012225710915845.sh
+ cd trunk/tests
+ /usr/local/bin/phpunit --verbose -d memory_limit=512M --log-junit
../../build/logs/phpunit.xml
--coverage-clover ../../build/logs/coverage/clover.xml
--coverage-html ../../build/logs/coverage-html/
PHPUnit 3.5.0 by Sebastian Bergmann.
IPS Default_IndexControllerTest .
Default_AuthControllerTest ......
Manage_UsersControllerTest .....
testDeleteInvalidUserId ..
testGetPermissionsForInvalidUserId .. Audit_OverviewControllerTest
............
Time: 14 seconds, Memory: 61.00Mb
[30;42m[2KOK (28 tests, 198
assertions) [0m[2K Writing code
coverage data to XML file, this may
take a moment.
Generating code coverage report, this
may take a moment.
Warning: Unknown: Error occured while
closing statement in Unknown on line 0
Warning: Unknown: Error occured while
closing statement in Unknown on line 0
Warning: Unknown: Error occured while
closing statement in Unknown on line 0
Warning: Unknown: Error occured while
closing statement in Unknown on line 0
Warning: Unknown: Error occured while
closing statement in Unknown on line 0
Warning: Unknown: Error occured while
closing statement in Unknown on line 0
Warning: Unknown: Error occured while
closing statement in Unknown on line 0
Warning: Unknown: Error occured while
closing statement in Unknown on line 0
[workspace] $ /bin/bash -xe
/tmp/hudson1439023061736436000.sh
+ /usr/local/bin/phpcpd --log-pmd ./build/logs/cpd.xml ./trunk phpcpd
1.3.2 by Sebastian Bergmann.
Found 1 exact clones with 6 duplicated
lines in 2 files:
library/Ips/Form/Decorator/SplitInput.php:8-14
library/Ips/Form/Decorator/FeetInches.php:10-16
0.04% duplicated lines out of 16585 total lines of code.
Time: 4 seconds, Memory: 19.50Mb [DRY]
Skipping publisher since build result
is FAILURE Publishing Javadoc [xUnit]
[INFO] - Starting to record. [xUnit]
[WARNING] - Can't create the path
/var/lib/hudson/.hudson/jobs/IPS/workspace/generatedJUnitFiles.
Maybe the directory already exists.
[xUnit] [INFO] - Processing
PHPUnit-3.4 (default) [xUnit] [INFO] -
[PHPUnit-3.4 (default)] - 1 test
report file(s) were found with the
pattern 'build/logs/phpunit.xml'
relative to
'/var/lib/hudson/.hudson/jobs/IPS/workspace'
for the testing framework 'PHPUnit-3.4
(default)'. [xUnit] [INFO] -
Converting
'/var/lib/hudson/.hudson/jobs/IPS/workspace/build/logs/phpunit.xml'
. [xUnit] [INFO] - Stopping recording.
Publishing Clover coverage report...
Publishing Clover XML report...
Publishing Clover coverage results...
Finished: FAILURE
What changed since Tuesday? Try to manually run exactly the same commands that Hudson tries to run from the same directory that Hudson starts it from (usually the jobs workspace directory). Of course with the user account that Hudson is started under.
There are several possibilities. ranging from standard groups for a directory, to permission, or other things outside of Hudson. Was Hudson upgraded? Was a plugin upgraded? Was the OS or php upgraded? Was there a change in the default or user .profile or .env (or the equivalent files)? Does another process accesses the workspace? ......
Once I had the problem that all of the sudden my deployment scripts did not run anymore. The mystery was, that I could still run the script from command line with the Hudson user account. The reason was simple but took a while to uncover. There was a java upgrade from 5 to 6. Both versions were available. After comparing the environment variables, there was a difference in the path. The problem was that the new path was set in the global .profile. But Hudson does not open an interactive shell, therefore the .profile will not be executed. If you have a problem like this, you can put the initialization in the .env file (or whatever the filename is for your system), because this will be run regardless if it is a interactive shell or not. Alternatively you can configure Hudson to set it on master or node/slave level.
if you want a command to not break the 'build' as a failure you have to add #! in front of the command to prevent the flags -xe which produce this behaviour.