snakemake only runs the first rule not all - workflow

My snakefile looks like this.
rule do00_download_step01_download_:
input:
output:
"data/00_download/scores.pqt"
run:
from lib.do00_download import do00_download_step01_download_
do00_download_step01_download_()
rule do00_download_step02_get_the_mean_:
input:
"data/00_download/scores.pqt"
output:
"data/00_download/cleaned.pqt"
run:
from lib.do00_download import do00_download_step02_get_the_mean_
do00_download_step02_get_the_mean_()
rule do01_corr_step01_correlate:
input:
"data/00_download/cleaned.pqt"
output:
"data/01_corr/corr.pqt"
run:
from lib.do01_corr import do01_corr_step01_correlate
do01_corr_step01_correlate()
rule do95_plot_step01_correlations:
input:
"data/01_corr/corr.pqt"
output:
"plot/heatmap.png"
run:
from lib.do95_plot import do95_plot_step01_correlations
do95_plot_step01_correlations()
rule do95_plot_step02_plot_dist:
input:
"data/00_download/cleaned.pqt"
output:
"plot/dist.png"
run:
from lib.do95_plot import do95_plot_step02_plot_dist
do95_plot_step02_plot_dist()
rule do99_figures_step01_make_figure:
input:
"plot/dist.png"
"plot/heatmap.png"
output:
"figs/fig01.svg"
run:
from lib.do99_figures import do99_figures_step01_make_figure
do99_figures_step01_make_figure()
rule all:
input:
"figs/fig01.svg"
I have arranged the rules in a sequential manner, hoping that this will make sure all the steps will be run in that order. However, when I run snakemake, it only runs the first rule and then it exits.
I have individually checked all the steps (functions that I import) if they work well, the paths of the input and output files. Everything looks ok. So I am guessing that the issue is with how I formatted the snakefile. I am new to snakemake (beginer-level). So it would be very helpful if somebody points out how I should fix this issue.

This is the intended behavior. Here's a relevalent section from the docs:
Moreover, if no target is given at the command line, Snakemake will define the first rule of the Snakefile as the target. Hence, it is best practice to have a rule all at the top of the workflow which has all typically desired target files as input files.
The output of the first rule is assumed to be the target.
If you move the rule all: to the top of your Snakefile, it should work as expected.

Related

Why might I get expected: MappingNode was SequenceNode during kpt pkg get?

I am undertaking https://www.kubeflow.org/docs/distributions/gke/deploy/deploy-cli/ and at the stage bash ./pull-upstream.sh there is a problem and I have isolated it to a single command inside the scripts:
kpt pkg get https://github.com/zijianjoy/pipelines.git/manifests/kustomize/#upgradekpt upstream
When I run this command alone, I get the same error as when it runs in the script:
Package "upstream":
Fetching https://github.com/zijianjoy/pipelines#upgradekpt
From https://github.com/zijianjoy/pipelines
* branch upgradekpt -> FETCH_HEAD
Adding package "manifests/kustomize".
Fetched 1 package(s).
Error: /home/tester_user/gcp-blueprints/kubeflow/apps/pipelines/upstream/third-party/argo/upstream/manifests/namespace-install/overlays/argo-server-deployment.yaml: wrong Node Kind for expected: MappingNode was SequenceNode: value: {- op: add
path: /spec/template/spec/containers/0/args/-
value: --namespaced}
I made some mistakes following the script during the setup (that I think I corrected) so it could be something I did. It would be good to know why this error is happening even so for my general understanding.
I am on google cloud platform, in the command line prompt that comes built in to the web ui.

Matlab test runner skips test functions that try to import non-existing file

Let's define a simple test class
classdef test_file < matlab.unittest.TestCase
methods(Test)
function test_function(testCase)
import some_package.some_function
testCase.verifyEqual(true,some_function(0));
end
end
end
It is irrelevant what some_function does. Function some_package.some_function does not exist in my path (for example I forgot to add it when pushing a commit). Whenever I try to run the test file above the test test_function is skipped with warning:
Warning: "test_file.m" was excluded.
Caused by:
Error: File: test_file.m Line: 4 Column: 20
The import statement 'import sc_force_models.apparent_accel' cannot be found or cannot be imported. Imported names must
end with '.*' or be fully qualified.
Since the test is skipped the problem is undetected and test runner returns 0 errors. In case someone forgets to commit a file I'd still like to detect this problem during testing, so expected behavior is to fail the test case.
As a workaround I've tried using the 'Strict',True argument to testrunner but it neither detects the issue. I've also tried putting the import statement between try, catch statements but it seems any code in the file is not executed.
Any ideas how to detect incorrect import statements in test cases?

How to copy specific headers in Soong

How can I extract specific files in Soong to use as headers?
I was recently writing a blueprint (Android.bp) file for paho-mqtt-c. My consumers require MQTTClient.h which paho-mqtt-c stores in src/ - what I would consider a "private" location. Reading their CMakeLists.txt file, it actually installs this and some other headers to include/.
As far as I can tell, Soong doesn't have this concept of installing so it seems like I could export_include_dirs the src directory - which seems wrong, or use a cc_genrule to copy these headers elsewhere.
But that's where I hit another issue: I can't seem to figure out how to create a cc_genrule that takes n inputs and writes n outputs (n-to-n). i.e.
cc_genrule {
name: "paho_public_headers",
cmd: "cp $(in) $(out)",
srcs: [ "src/MQTTAsync.h", "src/MQTTClient.h", "src/MQTTClientPersistence.h", "src/MQTTLogLevels.h" ]
out: [ "public/MQTTAsync.h", "public/MQTTClient.h", "public/MQTTClientPersistence.h", "public/MQTTLogLevels.h" ],
}
results in the failed command cp <all-inputs> <all-outputs>, rather than what I wanted which would be closer to iterating the command over each input/output pairs.
My solution was simply to write four cc_genrules, but that doesn't seem great either.
Is there a better way? (ideally without writing a custom tool)
The solution was to use gensrcs with a shard_size of 1.
For example:
gensrcs {
name: "paho_public_headers",
cmd: "mkdir -p $(genDir) && cat $(in) > $(out)",
srcs: [":paho_mqtt_c_header_files"],
output_extension: "h",
shard_size: 1,
export_include_dirs: ["src"],
}
The export_include_dirs is important, as with gensrcs there is little control over the output filename other than the extension, so it's easiest to keep the directory structure $(in) on the $(out) files.

Snakemake: Cluster multiple jobs together

I have a pretty simple snakemake pipeline that takes an input file does three subsequent steps to produce one output. Each individual job is very quick. Now I want to apply this pipeline to >10k files on an SGE cluster. Even if I use group to have one job for each three rules per input file, I would still submit >10k cluster jobs. Is there a way to instead submit limited number of cluster jobs (lets say 100) and distribute all tasks equally between them?
An example would be something like
rule A:
input: {prefix}.start
output: {prefix}.A
group "mygroup"
rule B:
input: {prefix}.A
output: {prefix}.B
group "mygroup"
rule C:
input: {prefix}.B
output: {prefix}.C
group "mygroup"
rule runAll:
input: expand("{prefix}.C", prefix = VERY_MANY_PREFIXES)
and then run it with
snakemake --cluster "qsub <some parameters>" runAll
You could process all the 10k files in the same rule using a for loop (not sure if this is what Manavalan Gajapathy has in mind). For example:
rule A:
input:
txt= expand('{prefix}.start', prefix= PREFIXES),
output:
out= expand('{prefix}.A', prefix= PREFIXES),
run:
io= zip(input.txt, output.out)
for x in io:
shell('some_command %s %s' %(x[0], x[1]))
and the same for rule B and C.
Look also at snakemake local-rules
The only solution I can think of would be to declare rules A, B, and C to be local rules, so that they run in the main snakemake job instead of being submitted as a job. Then you can break up your runAll into batches:
rule runAll1:
input: expand("{prefix}.C", prefix = VERY_MANY_PREFIXES[:1000])
rule runAll2:
input: expand("{prefix}.C", prefix = VERY_MANY_PREFIXES[1000:2000])
rule runAll3:
input: expand("{prefix}.C", prefix = VERY_MANY_PREFIXES[2000:3000])
...etc
Then you submit a snakemake job for runAll1, another for runAll2, and so on. You do this fairly easily with a bash loop:
for i in {1..10}; do sbatch [sbatch params] snakemake runAll$i; done;
Another option which would be more scalable than creating multiple runAll rules would be to have a helper python script that does something like this:
import subprocess
for i in range(0, len(VERY_MANY_PREFIXES), 1000):
subprocess.run(['sbatch', 'snakemake'] + ['{prefix}'.C for prefix in VERY_MANY_PREFIXES[i:i+1000]])

How to pass command-line arguments in CTest at runtime

I'm using CTest and want to pass command-line arguments to the underlying tests at runtime. I know there are ways to hard code command-line arguments into the CMake/CTest script, but I want to specify the command-line arguments at runtime and have those arguments passed through CTest to the underlying test.
Is this even possible?
I've figured out a way to do it (using the Fundamental theorem of software engineering). It's not as simple as I'd like, but here it is.
First, create a file ${CMAKE_SOURCE_DIR}/cmake/RunTests.cmake with the content
if(NOT DEFINED ENV{TESTS_ARGUMENTS})
set(ENV{TESTS_ARGUMENTS} "--default-arguments")
endif()
execute_process(COMMAND ${TEST_EXECUTABLE} $ENV{TESTS_ARGUMENTS} RESULT_VARIABLE result)
if(NOT "${result}" STREQUAL "0")
message(FATAL_ERROR "Test failed with return value '${result}'")
endif()
Then, when you add the test, use
add_test(
NAME MyTest
COMMAND ${CMAKE_COMMAND} -DTEST_EXECUTABLE=$<TARGET_FILE:MyTest> -P ${CMAKE_SOURCE_DIR}/cmake/RunTests.cmake
)
Finally, you can run the test with custom arguments using
cmake -E env TESTS_ARGUMENTS="--custom-arguments" ctest
Note that if you use bash, you can simplify this to
TESTS_ARGUMENTS="--custom-arguments" ctest
There are some problems with this approach, e.g. it ignores the WILL_FAIL property of the tests. Of course I wish it could be as simple as calling ctest -- --custom-arguments, but, as the Stones said, You can't always get what you want.
I'm not sure I fully understand what you want, but I still can give you a way to pass arguments to tests in CTest, at runtime.
I'll give you an example, with CTK (the Common Toolkit, https://github.com/commontk/CTK):
In the build dir (ex: CTK-build/CTK-build, it's a superbuild), if I run: ('-V' for Verbose, and '-N' for View Mode only)
ctest -R ctkVTKDataSetArrayComboBoxTest1 -V -N
I get:
UpdateCTestConfiguration from : /CTK-build/CTK-build/DartConfiguration.tcl
Parse Config file:/CTK-build/CTK-build/DartConfiguration.tcl
Add coverage exclude regular expressions.
Add coverage exclude: /CMakeFiles/CMakeTmp/
Add coverage exclude: .*/moc_.*
Add coverage exclude: .*/ui_.*
Add coverage exclude: .*/Testing/.*
Add coverage exclude: .*/CMakeExternals/.*
Add coverage exclude: ./ctkPixmapIconEngine.*
Add coverage exclude: ./ctkIconEngine.*
UpdateCTestConfiguration from :/CTK-build/CTK-build/DartConfiguration.tcl
Parse Config file:/CTK-build/CTK-build/DartConfiguration.tcl
Test project /CTK-build/CTK-build
Constructing a list of tests
Done constructing a list of tests
178: Test command: /CTK-build/CTK-build/bin/CTKVisualizationVTKWidgetsCppTests "ctkVTKDataSetArrayComboBoxTest1"
Labels: CTKVisualizationVTKWidgets
Test #178: ctkVTKDataSetArrayComboBoxTest1
Total Tests: 1
You can copy-paste the "Test command" in your terminal:
/CTK-build/CTK-build/bin/CTKVisualizationVTKWidgetsCppTests "ctkVTKDataSetArrayComboBoxTest1"
And add the arguments, for example "-I" for interactive testing:
/CTK-build/CTK-build/bin/CTKVisualizationVTKWidgetsCppTests "ctkVTKDataSetArrayComboBoxTest1" "-I"
Tell me if it helps.
matthieu's answer gave me the clue to get it to work for me.
For my code I did the following:
Type the command ctest -V -R TestMembraneCellCrypt -N to get the output:
...
488: Test command: path/to/ctest/executable/TestMembraneCellCrypt
Labels: Continuous_project_ChasteMembrane
Test #488: TestMembraneCellCrypt
...
Then I copied the Test command and provided the arguments there:
path/to/ctest/executable/TestMembraneCellCrypt -e 2 -em 5 -ct 10
I'll note that the package I'm using (Chaste), is pretty large so there might be things going on that I don't know about.