Can I patch the patch? - diff

In our CI we are patching files, the problem appears when we make changes in files which we patch, can I applay commit changes to patch file?
example:
text.txt
A
B
C
D
patch.patch
+++ b/text2.txt
## -1,4 +1,4 ##
A
B
-C
+X
D
new.txt:
Y
Y
B
C
D
diff text.txt new.txt > text_to_new.diff
diff --git a/text.txt b/new.txt
index 8422d40..4780582 100644
--- a/text.txt
+++ b/new.txt
## -1,4 +1,5 ##
-A
+Y
+Y
B
C
D
Can I update patch.patch with text_to_new.diff?

Related

Snakemake workflow, ChildIOException or MissingInputException

I am trying to add a file renaming step in my current workflow to make it easier on some of the other users. What I want to do is take the contigs.fasta file from a spades assembly directory and rename it to include the sample name. (i.e foo_de_novo/contigs.fasta to foo_de_novo/foo.fasta)
here is my code... well currently.
configfile: "config.yaml"
import os
def is_file_empty(file_path):
""" Check if file is empty by confirming if its size is 0 bytes"""
# Check if singleton file exist and it is empty from bbrepair output
return os.path.exists(file_path) and os.stat(file_path).st_size == 0
rule all:
input:
expand("{sample}_de_novo/{sample}.fasta", sample = config["names"]),
rule fastp:
input:
r1 = lambda wildcards: config["sample_reads_r1"][wildcards.sample],
r2 = lambda wildcards: config["sample_reads_r2"][wildcards.sample]
output:
r1 = temp("clean/{sample}_r1.trim.fastq.gz"),
r2 = temp("clean/{sample}_r2.trim.fastq.gz")
shell:
"fastp --in1 {input.r1} --in2 {input.r2} --out1 {output.r1} --out2 {output.r2} --trim_front1 20 --trim_front2 20"
rule bbrepair:
input:
r1 = "clean/{sample}_r1.trim.fastq.gz",
r2 = "clean/{sample}_r2.trim.fastq.gz"
output:
r1 = temp("clean/{sample}_r1.fixed.fastq"),
r2 = temp("clean/{sample}_r2.fixed.fastq"),
singles = temp("clean/{sample}.singletons.fastq")
shell:
"repair.sh -Xmx10g in1={input.r1} in2={input.r2} out1={output.r1} out2={output.r2} outs={output.singles}"
rule spades:
input:
r1 = "clean/{sample}_r1.fixed.fastq",
r2 = "clean/{sample}_r2.fixed.fastq",
s = "clean/{sample}.singletons.fastq"
output:
directory("{sample}_de_novo")
run:
isempty = is_file_empty("clean/{sample}.singletons.fastq")
if isempty == "False":
shell("spades.py --careful --phred-offset 33 -1 {input.r1} -2 {input.r2} -s {input.singletons} -o {output}")
else:
shell("spades.py --careful --phred-offset 33 -1 {input.r1} -2 {input.r2} -o {output}")
rule rename_spades:
input:
"{sample}_de_novo/contigs.fasta"
output:
"{sample}_de_novo/{sample}.fasta"
shell:
"cp {input} {output}"
When I have it written like this I get the MissingInputError and when I change it to this.
rule rename_spades:
input:
"{sample}_de_novo"
output:
"{sample}_de_novo/{sample}.fasta"
shell:
"cp {input} {output}"
I get the ChildIOException
I feel I understand why snakemake is unhappy with both versions. The first one is becasue I don't explicitly output the "{sample}_de_novo/contigs.fasta" file. Its just one of several files spades outputs. And the other error is because it doesn't like how I am asking it to look into the directory. I however am at a loss on how to fix this.
Is there a way to ask snakmake to look into a directory for a file and then perform the task requested?
Thank you,
Sean
EDIT File Structure of Spades output
Sample_de_novo
|-corrected/
|-K21/
|-K33/
|-K55/
|-K77/
|-misc/
|-mismatch_corrector/
|-tmp/
|-assembly_graph.fastg
|-assembly_graph_with_scaffolds.gfa
|-before_rr.fasta
|-contigs.fasta
|-contigs.paths
|-dataset.info
|-input_dataset.ymal
|-params.txt
|-scaffolds.fasta
|-scaffolds.paths
|spades.log
Make {sample}_de_novo/contigs.fasta to be the output of spades and parse its path to get the directory that will be the argument to spades -o. Snakemake won't mind if there are other files created in addition to contigs.fasta. This should run --dry-run mode:
rule all:
input:
expand('{sample}_de_novo/{sample}.fasta', sample=['A', 'B']),
rule spades:
output:
fasta='{sample}_de_novo/contigs.fasta',
run:
outdir=os.path.dirname(output.fasta)
shell(f'spades ... -o {outdir}')
rule rename:
input:
fasta='{sample}_de_novo/contigs.fasta',
output:
fasta='{sample}_de_novo/{sample}.fasta',
shell:
r"""
mv {input.fasta} {output.fasta}
"""
Nope, spoke too soon. It didn't name the output directory correctly, so I moved it to the params and, now, finailly is working the way I wanted.
rule spades:
input:
r1 = "clean/{sample}_r1.fixed.fastq",
r2 = "clean/{sample}_r2.fixed.fastq",
s = "clean/{sample}.singletons.fastq"
output:
"{sample}_de_novo/contigs.fasta"
params:
outdir = directory("{sample}_de_novo/")
run:
isempty = is_file_empty("clean/{sample}.singletons.fastq")
if isempty == "False":
shell("spades.py --isolate --phred-offset 33 -1 {input.r1} -2 {input.r2} -s {input.singletons} -o {params.outdir}")
else:
shell("spades.py --isolate --phred-offset 33 -1 {input.r1} -2 {input.r2} -o {params.outdir}")
rule rename_spades:
input:
"{sample}_de_novo/contigs.fasta"
output:
"{sample}_de_novo/{sample}.fasta"
shell:
"cp {input} {output}"

how to get diff filenames with output

is there a way for diff to return the filenames of the files being compared aswell as the output, for example:
instead of :
17c17
< free ((qu -> vals) - 1);
---
> free (qu -> vals);
I am looking for:
17c17
file1.c
< free ((qu -> vals) - 1);
---
file2.c
> free (qu -> vals);
is it possible?
THanks
the -u switch does include the filenames:
#!/bin/bash
echo " free ((qu -> vals) - 1);" > file1.c
echo " free (qu -> vals);" > file2.c
diff -u file1.c file2.c
output:
--- file1.c 2014-03-13 17:46:43.000000000 -0500
+++ file2.c 2014-03-13 17:46:43.000000000 -0500
## -1 +1 ##
- free ((qu -> vals) - 1);
+ free (qu -> vals);

diff ignore white spaces or the same string on a different line

I need to make diff between two files but If I have the same lines in the files on a different line, I don't want to display any output.
Example:
File1:
cc aaaw
bb bbbw
aa cccw
File2:
cc aaaw
bb bbbw
aa cccw
diff file1 file2:
2d1
< bb bbbw
3a3
> bb bbbw
-> I don't want any output
but If I have file1 as the one above and file2:
cc aaaw
bb bbbw
aa cccw
ddddddd
I want this output:
4a5
> ddddddd
Thanks.
You might use diff -B to ignore empty/blank lines.

Sed remove text to text except last line

I want to delete part of text:
0
1
test1
a
b
random letter
test2
e
f
g
I want to get:
0
1
test2
e
f
g
I've tried use sed:
sed '/test1/,/test2/d'
But it will remove test2 too
How can I delete text and save test2, if I don't exactly know what text before test2
I need to use awk or sed
give this a try:
sed '/test1/,/test2/{/test2/!d}'
test with your example:
kent$ echo "0
1
test1
a
b
random letter
test2
e
f
g"|sed '/test1/,/test2/{/test2/!d}'
0
1
test2
e
f
g
awk 'BEGIN{p=1}/test1/{p=0}/test2/{p=1}p' your_file
Tested Below:
> cat temp
0
1
test1
a
b
random letter
test2
e
f
g
>
> awk 'BEGIN{p=1}/test1/{p=0}/test2/{p=1}p' temp
0
1
test2
e
f
g
>
If you want to search for whole word in awk:
search like below:
/\<WORD\>/
Alternatively you can go perl as well:
perl -lne 'BEGIN{$p=1}if(/\btest1\b/){$p=0}if(/\btest2\b/){$p=1}print if $p' your_file

How does one extract a unified-diff style patch subset?

Every time I want to take a subset of a patch, I'm forced to write a script to only extract the indices that I want.
e.g. I have a patch that applies to sub directories
'yay' and 'foo'.
Is there a way to create a new patch or apply only a subset of a patch? i.e. create a new patch from the existing patch that only takes all indices that are under sub directory 'yay'. Or all indices that are not under sub directory 'foo'
If I have a patch like ( excuse the below pseudo-patch):
Index : foo/bar
yada
yada
- asdf
+ jkl
yada
yada
Index : foo/bah
blah
blah
- 28
+ 29
blah
blah
blah
Index : yay/team
go
huskies
- happy happy
+ joy joy
cougars
suck
How can I extract or apply only the 'yay' subdirectory like:
Index : yay/team
go
huskies
- happy happy
+ joy joy
cougars
suck
I know if I script up a solution I'll be re-inventing the wheel...
Take a look at the filterdiff utility, which is part of patchutils.
For example, if you have the following patch:
$ cat example.patch
diff -Naur orig/a/bar new/a/bar
--- orig/a/bar 2009-12-02 12:41:38.353745751 -0800
+++ new/a/bar 2009-12-02 12:42:17.845745951 -0800
## -1,3 +1,3 ##
4
-5
+e
6
diff -Naur orig/a/foo new/a/foo
--- orig/a/foo 2009-12-02 12:41:32.845745768 -0800
+++ new/a/foo 2009-12-02 12:42:25.697995617 -0800
## -1,3 +1,3 ##
1
2
-3
+c
diff -Naur orig/b/baz new/b/baz
--- orig/b/baz 2009-12-02 12:41:42.993745756 -0800
+++ new/b/baz 2009-12-02 12:42:37.585745735 -0800
## -1,3 +1,3 ##
-7
+z
8
9
Then you can run the following command to extract the patch for only things in the a directory like this:
$ cat example.patch | filterdiff -i 'new/a/*'
--- orig/a/bar 2009-12-02 12:41:38.353745751 -0800
+++ new/a/bar 2009-12-02 12:42:17.845745951 -0800
## -1,3 +1,3 ##
4
-5
+e
6
--- orig/a/foo 2009-12-02 12:41:32.845745768 -0800
+++ new/a/foo 2009-12-02 12:42:25.697995617 -0800
## -1,3 +1,3 ##
1
2
-3
+c
Here's my quick and dirty Perl solution.
perl -ne '#a = split /^Index :/m, join "", <>; END { for(#a) {print "Index :", $_ if (m, yay/team,)}}' < foo.patch
In response to sigjuice's request in the comments, I'm posting my script solution. It isn't 100% bullet proof, and I'll probably use filterdiff instead.
base_usage_str=r'''
python %prog index_regex patch_file
description:
Extracts all indices from a patch-file matching 'index_regex'
e.g.
python %prog '^evc_lib' p.patch > evc_lib_p.patch
Will extract all indices which begin with evc_lib.
-or-
python %prog '^(?!evc_lib)' p.patch > not_evc_lib_p.patch
Will extract all indices which do *not* begin with evc_lib.
authors:
Ross Rogers, 2009.04.02
'''
import re,os,sys
from optparse import OptionParser
def main():
parser = OptionParser(usage=base_usage_str)
(options, args) = parser.parse_args(args=sys.argv[1:])
if len(args) != 2:
parser.print_help()
if len(args) == 0:
sys.exit(0)
else:
sys.exit(1)
(index_regex,patch_file) = args
sys.stderr.write('Extracting patches for indices found by regex:%s\n'%index_regex)
#print 'user_regex',index_regex
user_index_match_regex = re.compile(index_regex)
# Index: verification/ring_td_cte/tests/mmio_wr_td_target.e
# --- sw/cfg/foo.xml 2009-04-30 17:59:11 -07:00
# +++ sw/cfg/foo.xml 2009-05-11 09:26:58 -07:00
index_cre = re.compile(r'''(?:(?<=^)|(?<=\n))(--- (?:.*\n){2,}?(?![ #\+\-]))''')
patch_file = open(patch_file,'r')
all_patch_sets = index_cre.findall(patch_file.read())
patch_file.close()
for file_edit in all_patch_sets:
# extract index subset
index_path = re.compile('\+\+\+ (?P<index>[\w_\-/\.]+)').search(file_edit).group('index').strip()
if user_index_match_regex.search(index_path):
sys.stderr.write("Index regex matched index: "+index_path+"\n")
print file_edit,
if __name__ == '__main__':
main()