Is it possible to define again rule all in snakemake?

Is it possible to define again rule all in snakemake? - workflow

I run in parallel processing of 8 files (fastq) with snakemake. Then each of these file is demultiplexed and I run in parallel processing of demultiplexed files generated by each of these files with snakemake again.
My first attempt (which is working well) was to use 2 snakefiles.
a snakefile for processing 8 files in parallel
a snakefile for processing generated demultiplexed files in parallel
I would like to use only one snakefile.
Here the solution with 2 snakefiles:
snakefile #1 for processing 8 files (wildcard {run}) in parallel
configfile: "config.yaml"
rule all:
input:
expand("{folder}{run}_R1.fastq.gz", run=config["fastqFiles"],folder=config["fastqFolderPath"]),
expand('assembled/{run}/{run}.fastq', run=config["fastqFiles"]),
expand('assembled/{run}/{run}.ali.fastq', run=config["fastqFiles"]),
expand('assembled/{run}/{run}.ali.assigned.fastq', run=config["fastqFiles"]),
expand('assembled/{run}/{run}.unidentified.fastq', run=config["fastqFiles"]),
expand('log/remove_unaligned/{run}.log',run=config["fastqFiles"]),
expand('log/illuminapairedend/{run}.log',run=config["fastqFiles"]),
expand('log/assign_sequences/{run}.log',run=config["fastqFiles"]),
expand('log/split_sequences/{run}.log',run=config["fastqFiles"])
include: "00-rules/assembly.smk"
include: "00-rules/demultiplex.smk"
snakefile #2 for processing generated demultiplexed files in parallel
SAMPLES, = glob_wildcards('samples/{sample}.fasta')
rule all:
input:
expand('samples/{sample}.uniq.fasta',sample=SAMPLES),
expand('samples/{sample}.l.u.fasta',sample=SAMPLES),
expand('samples/{sample}.r.l.u.fasta',sample=SAMPLES),
expand('samples/{sample}.c.r.l.u.fasta',sample=SAMPLES),
expand('log/dereplicate_samples/{sample}.log',sample=SAMPLES),
expand('log/goodlength_samples/{sample}.log',sample=SAMPLES),
expand('log/clean_pcrerr/{sample}.log',sample=SAMPLES),
expand('log/rm_internal_samples/{sample}.log',sample=SAMPLES)
include: "00-rules/filtering.smk"
This solution is working well.
Is it possible to merge these 2 snakefiles into one this way ?
configfile: "config.yaml"
rule all:
input:
expand("{folder}{run}_R1.fastq.gz", run=config["fastqFiles"],folder=config["fastqFolderPath"]),
expand('assembled/{run}/{run}.fastq', run=config["fastqFiles"]),
expand('assembled/{run}/{run}.ali.fastq', run=config["fastqFiles"]),
expand('assembled/{run}/{run}.ali.assigned.fastq', run=config["fastqFiles"]),
expand('assembled/{run}/{run}.unidentified.fastq', run=config["fastqFiles"]),
expand('log/remove_unaligned/{run}.log',run=config["fastqFiles"]),
expand('log/illuminapairedend/{run}.log',run=config["fastqFiles"]),
expand('log/assign_sequences/{run}.log',run=config["fastqFiles"]),
expand('log/split_sequences/{run}.log',run=config["fastqFiles"])
include: "00-rules/assembly.smk"
include: "00-rules/demultiplex.smk"
SAMPLES, = glob_wildcards('samples/{sample}.fasta')
rule all:
input:
expand('samples/{sample}.uniq.fasta',sample=SAMPLES),
expand('samples/{sample}.l.u.fasta',sample=SAMPLES),
expand('samples/{sample}.r.l.u.fasta',sample=SAMPLES),
expand('samples/{sample}.c.r.l.u.fasta',sample=SAMPLES),
expand('log/dereplicate_samples/{sample}.log',sample=SAMPLES),
expand('log/goodlength_samples/{sample}.log',sample=SAMPLES),
expand('log/clean_pcrerr/{sample}.log',sample=SAMPLES),
expand('log/rm_internal_samples/{sample}.log',sample=SAMPLES)
include: "00-rules/filtering.smk"
So i have to define again rule all.
and i got the following message error:
The name all is already used by another rule
Is they're a way to have many rule all or the solution "using many snakefiles" is the only one possible ?
I would like to use snakemake in the most appropriate way possible.

You are not limited in the naming of the top level rule. You may call it all, or you may rename it: the only thing that matters is the order of their definition. By default Snakemake takes the first rule as the target one and then constructs the graph of dependencies.
Taking that in consideration you have several options. First, you can merge both top-level rules from the workflows into one. At the end of the day your all rules do nothing except of the definition of the target files. Next, you may rename your rules into all1 and all2 (making it possible to run a single workflow if you specify that in command line), and provide the all rule with merged input. Finally you can use subworkflows, but as long as your intention is to squash two scripts into one, that would be an overkill.
One more hint that could help: you don't need to specify the pattern expand('filename{sample}',sample=config["fastqFiles"]) for each file if you define a distinct output for each run. For example:
rule sample:
input:
'samples/{sample}.uniq.fasta',
'samples/{sample}.l.u.fasta',
'samples/{sample}.r.l.u.fasta',
'samples/{sample}.c.r.l.u.fasta',
'log/dereplicate_samples/{sample}.log',
'log/goodlength_samples/{sample}.log',
'log/clean_pcrerr/{sample}.log',
'log/rm_internal_samples/{sample}.log'
output:
temp('flag_sample_{sample}_complete')
In this case the all rule becomes trivial:
rule all:
input: expand('flag_sample_{sample}_complete', sample=SAMPLES)
Or, as I advised before:
rule all:
input: expand('flag_run_{run}_complete', run=config["fastqFiles"]),
input: expand('flag_sample_{sample}_complete', sample=SAMPLES)
rule all1:
input: expand('flag_run_{run}_complete', run=config["fastqFiles"])
rule all2:
input: expand('flag_sample_{sample}_complete', sample=SAMPLES)

Related

Azure pipeline parameter split doesn't work

I have a mutli-step azure pipeline used to trigger the execution of a certain job based on keywords I have in azure devops work items.
First step executed is a powershell script that stores into a 'validTags' variable a comma-separated list of strings:
Write-Host "##vso[task.setvariable variable=validTags]$csTags"
After this step, I correctly see the list formatted as I expect:
string1,string2,string3
The 'validTags' variable is then passed as a parameter to another pipeline in which I should split this list and trigger separate jobs:
- template: run.yml
parameters:
tags: $(validTags)
directory: 'path\to\tests'
platforms: 'platform1,platform2'
In the 'run' pipeline I defined this 'tags' parameter:
parameters:
- name: tags
type: string
default: 'someDefaultValue'
and I try to split the parameter:
- ${{each t in split(parameters.tags, ',')}}:
- script: |
echo 'starting job for ${{t}}'
but when I execute the pipeline, I have in 't' still the full string (string1,string2,string3) not splitted.
I have noticed that if I try to perform the split on the "platforms" parameter which is passed along with "tags" to the run.yml pipeline, it works, so it seems that the problem is related to the fact that I am trying to split a string stored in an external variable?
Anyone with a similar issue? Any help on this is much appreciated.
Thanks

For those interested in the outcome of this issue:
I tested several possible alternate solutions, including the use of global variables and group variables, but without success.
I submitted a request to MSFT engineering support to get some insight on this and their response is:
The pipeline does not support splitting the runtime variable with
template syntax ${{ }} currently, and we are not able to find other
workarounds to meet your request. Sorry for the inconvenience. Hope
you can understand.
So, to overcome the issue I removed the split at the pipeline level, as initially planned, but rather passed the comma-separated value's string to the template and added there the necessary processing in Powershell.
Another option would have been to perform all the operations from within the first PowerShell script step:
transform the 'run.yml' template in a separate pipeline
in the script, after getting the tags, loop over their values and trigger the 'run.yml' pipeline passing the single tag as a parameter.
I avoided this solution to keep the operations separate and have more control over the execution flow.

Drools filtering stream within DRL

I would like to create a rule with lamba expression like below:
when
rule_id: String() from "784acba8-32e5-41de-bd73-04f9ce2bfaff"
$: DroolsEventWrapper(Arrays.stream("testing".split("")).filter(element->(element!=null && element.contains("e"))).findFirst().isPresent())
then
System.out.println("Qualified for "+rule_id);
end
just a simple check "testing" will be given as a parameter also; however when I use it drl. It gave error like:
mismatched input '.' in rule "784acba8-32e5-41de-bd73-04f9ce2bfaff" in pattern
When I check the location of that dot; it belongs to filter(element->(element!=null && element.contains("e"))) when I omit it, it is working.
I am using the latest version of Drools: 7.55.0.Final. I found some tickets which say that lambda expressions somehow buggy at previous versions but not the latest ones.
Do I missing something, or is there any way to run this within DRL ?

What if there exists no matched rule in a Lex program because of REJECT?

I'm currently reading the documentation on Lex written by Lesk and Schmidt, and get confused by the REJECT action.
Consider the two rules
a[bc]+ { ... ; REJECT;}
a[cd]+ { ... ; REJECT;}
Input:
ab
Only the first matches, and see what we get from the material.
The action REJECT means ``go do the next alternative.'' It causes whatever rule was second choice after the current rule to be executed.
However, there is no second choice, will there comes a error?

There are really very few use cases for REJECT; I don't think I've ever seen an instance of it in use other than in examples.
Anyway, unless you specify %option nodefault (or the -s command-line flag), flex will add a default fallback action to your ruleset, equivalent to
.|\n ECHO;
In your case, that pattern will match after the REJECT.
However, it is possible to override the default action; for example, you could add the rule:
.|\n REJECT;
In that case, flex really will not have an alternative after the two REJECTs, and it will print an error message on stderr ("flex scanner jammed") and then call exit.

Macro name expanded from another macro in makefile

I have a makefile with the following format. First I define what my outputs are;
EXEFILES = myexe1.exe myexe2.exe
Then I define what the dependencies are for those outputs;
myexe1.exe : myobj1.obj
myexe2.exe : myobj2.obj
Then I have some macros that define extra dependencies for linking;
DEP_myexe1 = lib1.lib lib2.lib
DEP_myexe2 = lib3.lib lib4.lib
Then I have the target for transforming .obj to .exe;
$(EXEFILES):
$(LINK) -OUT:"Exe\$#" -ADDOBJ:"Obj\$<" -IMPLIB:$($($(DEP_$*)):%=Lib\\%)
What I want to happen is (example for myexe1.exe)
DEP_$* -> DEP_myexe1
$(DEP_myexe1) -> lib1.lib lib2.lib
$(lib1.lib lib2.lib:%=Lib\\%) -> Lib\lib1.lib Lib\lib2.lib
Unfortunately this is not working. When I run make --just-print, the -IMPLIB: arguments are empty. However, if I run $(warning DEP_$*) I get
DEP_myexe1
And when I run $(warning $(DEP_myexe1)) I get
lib1.lib lib2.lib
So for some reason, make does not like the combination of $(DEP_$*). Perhaps it cannot resolve macro names dynamically like this. What can I do to get this to work? Is there an alternative?

Where does $(warning DEP_$*) give you DEP_myexe1 as output exactly? Because given your makefile above it shouldn't.
$* is the stem of the target pattern that matched. In your case, because you have explicit target names, you have no patten match and so no stem and so $* is always empty.
Additionally, you are attempting a few too many expansions. You are expanding $* to get myexe1 directly (assuming for the moment that variable works the way you intended). You then prefix that with DEP_ and used $(DEP_$*) to get the lib1.lib lib2.lib. You then expand that result $($(DEP_$*)) and then expand that (empty) result again (to do your substitution) $($($(DEP_$*)):%=Lib\\%).
You want to either use $(#:.exe=) instead of $* in your rule body or use %.exe as your target and then use $* to get myexe1/myexe2.
You then want to drop two levels of expansion from $($($(DEP_$*)):%=Lib\\%) and use $(DEP_$*:%=Lib\\%) instead.
So (assuming you use the pattern rule) you end up with:
%.exe:
$(LINK) -OUT:"Exe\$#" -ADDOBJ:"Obj\$<" -IMPLIB:$(DEP_$*:%=Lib\\%)

I managed to get it working without needing to resolve macros in the way described above. I modified the linking dependencies like this;
myexe1.exe : myobj1.obj lib1.lib lib2.lib
myexe2.exe : myobj2.obj lib3.lib lib4.lib
Then I need to filter these files by extension in the target recipe;
$(EXEFILES):
$(LINK) -OUT:"$(EXE_PATH)\$#" -ADDOBJ:$(patsubst %, Obj\\%, $(filter %.obj, $^)) -IMPLIB:$(patsubst %, Lib\\%, $(filter %.lib, $^))
The $(pathsubst ...) is used to prepend the path that the relevant files are in.
In the case of myexe1.exe, the link command expands to;
slink -OUT:"Exe\myexe1.exe" -ADDOBJ: Obj\myexe1.obj -IMPLIB: Lib\lib1.lib Lib\lib2.lib
Out of interest's sake, I would still like to know if it is possible to resolve macro names like in the question.

Identifying .patch files

Here is the problem. I am writing a piece of software that query bug attachments from a Bug Tracking System. I am able to filter the query by only getting text attachment (plain/text etc) and I want to keep only valid patch files (files that have a similar diff -u output that can be applied as a patch to a source file). So I need a way to specify which attachment is a valid patch. For example:
let say I have this attachment with the following content:
Index: compiler/cpp/src/generate/t_csharp_generator.cc
--- compiler/cpp/src/generate/t_csharp_generator.cc (revision 1033689)
+++ compiler/cpp/src/generate/t_csharp_generator.cc (working copy)
## -456,7 +456,7 ##
t = ((t_typedef*)t)->get_type();
}
if ((*m_iter)->get_value() != NULL) {
- print_const_value(out, "this." + (*m_iter)->get_name(), t, (*m_iter)->get_value(), true, true);
+ print_const_value(out, "this._" + (*m_iter)->get_name(), t, (*m_iter)->get_value(), true, true);
}
}
How can I know this is a valid patch? Is there a regex to match some possible diff -u outputs so I can write something like this in java:
String attachmentContent = .....
if(attachmentContent.matches(regex))...
Thank you in advance,
Elvis

You won't be able to test with a simple Regex, nor a complex by the way, it would require a regular expression engine which is able to interpret the portions between ## as numbers, and define repetition counts from that, I don't know a RE engine which does that.
On the other hand you should not have too much problems finding a library parsing and loading Unix patches (and this is definitely not a compute-intensive task), so simply trying to load the patch would allow you to validate it. e.g. Java diff utils should do that straight forward.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Is it possible to define again rule all in snakemake? - workflow

Related

Azure pipeline parameter split doesn't work

Drools filtering stream within DRL

What if there exists no matched rule in a Lex program because of REJECT?

Macro name expanded from another macro in makefile

Identifying .patch files

Categories

Resources