If matched then print all using awk - command-line

I have a file which contains many sub-sections each starting with [begin] and ending with [end]:
[begin li1_1378184738754_91]
header=7075|lime|0|0|109582|0|1|2700073||0|0|0|[355]|1|0|ssb-li1-1378184738754-90||0||LIME |0|saved=true|0.002406508312038836|0|[ser=zu1:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=uzu6:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzs5:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=sv-stda-zu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=hzu8:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=lzu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=yzu2:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzu7:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer]|0|null|false|40||false|
attrs=0|0|0||0|
ptitle=690751404|1|1|1|Rest:1998636||||||2700401|175619|900.5636134725806|0.985486|39.166666666666664|$9.99|100.0|1|||
seller=1998636|1|9.99|1|-1||0|||||true||4.7937584|10412|false|
ptitle=5543369186|2|1|1|Rest:1533891||||||2700211|19615|886.8211044369053|0.776121|34.0|$119.99|100.0|1|||
seller=1533891|1|119.99|3|-1|1.0:text,In+size+6.0%2C7.0%2C8.0%2C8.5%2C9.0%2C9.5%2C10.0%2C...,0.0,,,,0,0,|2|||||true||2.95|20|true|
ptitle=622529158|3|1|1|||||||2700408|67402|796.5289827432475|0.893899|63.0|$5.27|100.0|1|||
seller=4281413|1|5.27|1|-1||0|||||true||4.695052|1769|true|
ptitle=5507199621|4|1|1|||||||2700220|56412|706.9031281251306|0.791171|45.0|$99.99|100.0|1|||
seller=4806107|1|-1.0|1|-1|1.0:sale,$,30.000000000000014,0.0,,,,0,0,:text,In+size+6.0%2C6.5%2C7.0%2C7.5%2C8.0%2C8.5%2C9.0%2C9...,0.0,,,,0,0,|2||||$130 $30.00 off|false||5.0|1|false|
ptitle=5502728013|5|1|1|||||||900000|0|698.7772340643119|0.836740|75.0|$40.95|100.0|1|||
seller=955448|1|40.95|1|-1||0|||||false||4.142857|7|false|
ptitle=840662011|6|1|1|Rest:265238||||||300233|62718|683.2927820751431|0.995513|52.0|$22.95|100.0|1|||
seller=265238|1|22.95|1|-1||0|||||false||4.478261|23|false|
ptitle=848084980|8|1|1|||||||2700073|145653|670.4809846773688|0.880587|60.0|$24.99|100.0|1|||
seller=5267046|1|24.99|1|-1||0|||||true||0.0|0|false|
ptitle=891200492|9|1|1|Rest:1030132||||||2701003|17215|668.8437575254773|0.825491|32.0|$519.99|100.0|1|||
seller=1030132|1|519.99|1|-1||0|||||false||4.7391305|23|false|
ptitle=641974054|10|1|1|||||||900000|69433|667.6678790058678|0.752129|57.0|$4.19|100.0|1|||
seller=3365158|1|4.19|1|-1||0|||||true||4.70907|4410|true|
ptitle=517591869|12|1|1|Rest:4802895||||||2700408|127644|643.0972570735605|0.893899|17.25|$23.95|100.0|1|||
seller=4318776|1|-1.0|3|-1||0|||||false||0.0|0|false|
ptitle=541549480|13|1|1|Rest:1180414||||||2702000|105832|597.4904572011968|0.752129|24.666666666666664|$8.27|100.0|1|||
seller=4636561|1|8.27|1|-1||0|||||false||4.8283377|734|true|
ptitle=1020561900|14|1|1|||||||2700063|159813|594.4717491579845|0.934869|75.0|$5.39|100.0|1|||
seller=4722645|1|5.39|1|-1|1.0:sale,$,0.6000000000000005,0.0,,,,0,0,:text,Free+Shipping+on+All+Orders%21,0.0,201301010000/,,,0,0,|2||||$5.99 $0.60 off|true||4.3942246|1593|true|
ptitle=507792308|15|1|1|Rest:4683455||||||2702000|105832|591.7739184402442|0.768311|22.5|$9.48|100.0|1|||
seller=4910651|1|-1.0|2|-1||0|||||false||5.0|1|false|
ptitle=1090571346|16|1|1|Rest:4452919||||||2700211|20824|776.4814913363535|0.776121|35.0|$59.99|100.0|1|||
seller=1533891|1|59.99|1|-1|1.0:sale,$,49.99999999999999,0.0,,,,0,0,:text,In+size+7.5%2C8.0%2C8.5%2C9.0%2C9.5%2C10.0%2C10.5...,0.0,,,,0,0,|2||||$110 $50.00 off|true||2.95|20|true|
ptitle=573017390|17|1|1|||||||2700073|91937|679.695660577044|0.880587|33.5|$14.85|100.0|1|||
seller=4281413|1|14.85|1|-1||0|||||true||4.695052|1769|true|
ptitle=5502723300|18|1|1|||||||900000|0|639.3095640940136|0.836740|75.0|$50.95|100.0|1|||
seller=955448|1|50.95|1|-1||0|||||false||4.142857|7|false|
ptitle=940022974|20|1|1|||||||2700600|58701|569.9503499778303|0.875839|59.0|$14.40|100.0|1|||
seller=4825227|1|14.4|1|12||0|||||true||4.0289855|276|true|
ptitle=5513277553|21|1|1|||||||2700220|56412|565.2712749001105|0.776121|44.33333333333333|$129.95|100.0|1|||
seller=4825252|1|129.95|1|23||0|||||true||4.0289855|276|true|
ptitle=890329961|22|1|1|||||||2700408|133796|564.7642425785796|0.837916|34.75|$61.95|100.0|1|||
seller=4825235|1|61.95|4|19||0|||||true||4.0289855|276|true|
ptitle=753852910|24|1|1|||||||2700073|146738|557.7419123688652|0.934869|47.69230769230769|$26.99|100.0|1|||
seller=4722645|1|26.99|10|-1|1.0:sale,$,3.0,0.0,,,,0,0,:text,Free+Shipping+on+All+Orders%21,0.0,201301010000/,,,0,0,|2||||$29.99 $3.00 off|true||4.3942246|1593|true|
ptitle=654738989|26|1|1|||||||900000|84012|554.7756559595525|0.752129|57.0|$3.19|100.0|1|||
seller=3365158|1|3.19|1|-1||0|||||true||4.70907|4410|true|
ptitle=707747307|27|1|1|Rest:4736009||||||2700063|76249|552.234395428327|0.889614|19.857142857142854|$6.39|100.0|1|||
seller=4736009|1|6.39|1|-1||0|||||false||4.8071113|15356|true|
ptitle=63531001|28|1|1|||||||2700408|82712|625.0421885589608|0.893899|47.166666666666664|$7.69|100.0|1|||
seller=4281413|1|7.69|3|-1||0|||||true||4.695052|1769|true|
ptitle=5502728016|29|1|1|||||||900000|0|605.9895507237038|0.836740|75.0|$503.00|100.0|1|||
seller=955448|1|503.0|1|-1||0|||||false||4.142857|7|false|
ptitle=507792308|31|1|1|Rest:4683455||||||2702000|105832|559.6902659046442|0.752129|22.5|$8.99|100.0|1|||
seller=5105812|1|-1.0|1|-1||0|||||false||0.0|0|false|
ptitle=753852910|32|1|1|||||||2700073|146738|545.9987095658629|0.870929|47.69230769230769|$22.49|100.0|1|||
seller=4143386|1|22.49|6|-1|1.0:sale,$,7.5,0.0,,,,0,0,:text,Free+Shipping+on+Orders+Over+%24100,0.0,201109010000/201409302359,,,0,0,|2||||$29.99 $7.50 off|false||4.7316346|2355|true|
ptitle=5513277553|33|1|1|Rest:1533891||||||2700220|56412|653.3133907916089|0.825491|44.33333333333333|$149.99|100.0|1|||
seller=1533891|1|149.99|3|-1|1.0:text,In+size+5.0%2C5.5%2C6.0%2C6.5%2C7.0%2C7.5%2C8.0%2C8...,0.0,,,,0,0,|2|||||true||2.95|20|true|
ptitle=63531001|34|1|1|||||||2700408|82712|541.8233547780552|0.893899|47.166666666666664|$7.72|100.0|1|||
seller=2370155|1|7.72|4|-1||0|||||false||4.85|40|false|
ptitle=1018957017|35|1|1|||||||2700073|145653|540.6093714604533|0.860614|56.0|$25.95|100.0|1|||
seller=5036683|1|25.95|1|-1||0|||||false||4.8405056|366|false|
ptitle=743682867|36|1|1|||||||2700073|63437|539.5985846455641|0.870929|58.0|$46.99|100.0|1|||
seller=193176|1|46.99|1|-1||0|||||true||4.8511987|1418|true|
ptitle=679858288|37|1|1|||||||2700063|188669|535.1360632897284|0.902031|30.0|$12.41|100.0|1|||
seller=4143386|1|12.41|2|-1|1.0:sale,$,1.379999999999999,0.0,,,,0,0,:text,Free+Shipping+on+Orders+Over+%24100,0.0,201109010000/201409302359,,,0,0,|2||||$13.79 $1.38 off|false||4.7316346|2355|true|
ptitle=994328713|38|1|1|||||||2700073|71463|534.7715925279717|0.870929|58.0|$1.29|100.0|1|||
seller=1787388|1|1.29|1|-1||0|||||false||4.680464|3624|false|
ptitle=886915818|40|1|1|||||||2700444|201835|529.7519801432289|0.934869|65.5|$44.99|100.0|1|||
seller=4561883|1|44.99|2|-1||0|||||true||4.7913384|508|false|
seller_hidden=227502|990765963|1147436601|-1
seller_hidden=5310958|622529158|5645627277|-1
seller_hidden=4825254|5543369186|5651114316|23
seller_hidden=5289138|5548930281|5653769481|-1
[end li1_1378184738754_91]
I am trying to run the command cat /home/nextag/logs/OutpdirImpressions.log.2013-09-02 | awk -F "$begin" '{print $0}' | awk '$0 ~ "header=7075" {print $0}'
As per this command i want to split the entire file into sub-sections beginning with the word 'begin'. Now in that i want those sub-sections which contains 'header=7075'
Expected output is that it will print the entire sub-section(those which contain that string), but i am getting only this portion as output:
header=7075|lime|0|0|109582|0|1|2700073||0|0|0|[355]|1|0|ssb-li1-1378184738754-90||0||LIME
|0|saved=true|0.002406508312038836|0|[ser=zu1:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=uzu6:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzs5:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=sv-stda-zu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=hzu8:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=lzu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=yzu2:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzu7:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer]|0|null|false|40||false|
I have tried using if in various ways, but it doesn't works. Can somebody please help me.
I tried awk -F "$begin" '{if($0 ~ "header=7075") {print $0}}' /home/nextag/logs/OutpdirImpressions.log.2013-09-02. It gave the same result
Can somebody please suggest that how do i get the complete sub-section in the result

Try this awk one-liner:
awk '$1=="[end"{p=0}/^header=7075/{p=1}p' file
In parts:
$1=="[end"{p=0} if you reach a line, with the first word "[end", then set the flag to zero
/^header=7075/{p=1} If you reach a line, which begins with "header=7075", set set the flag to one
p if the flag is non-zero, print the current line (equivalent to p{print} or p{print $0} or p!=0{print $0}

Related

WildcardError - No values given for wildcard - Snakemake

I am really lost what exactly I can do to fix this error.
I am running snakemake to perform some post alignment quality checks.
My code looks like this:
SAMPLES = ["Exome_Tumor_sorted_mrkdup_bqsr", "Exome_Norm_sorted_mrkdup_bqsr",
"WGS_Tumor_merged_sorted_mrkdup_bqsr", "WGS_Norm_merged_sorted_mrkdup_bqsr"]
rule all:
input:
expand("post-alignment-qc/flagstat/{sample}.txt", sample=SAMPLES),
expand("post-alignment-qc/CollectInsertSizeMetics/{sample}.txt", sample=SAMPLES),
expand("post-alignment-qc/CollectAlignmentSummaryMetrics/{sample}.txt", sample=SAMPLES),
expand("post-alignment-qc/CollectGcBiasMetrics/{sample}_summary.txt", samples=SAMPLES) # this is the problem causing line
rule flagstat:
input:
bam = "align/{sample}.bam"
output:
"post-alignment-qc/flagstat/{sample}.txt"
log:
err='post-alignment-qc/logs/flagstat/{sample}_stderr.err'
shell:
"samtools flagstat {input} > {output} 2> {log.err}"
rule CollectInsertSizeMetics:
input:
bam = "align/{sample}.bam"
output:
txt="post-alignment-qc/CollectInsertSizeMetics/{sample}.txt",
pdf="post-alignment-qc/CollectInsertSizeMetics/{sample}.pdf"
log:
err='post-alignment-qc/logs/CollectInsertSizeMetics/{sample}_stderr.err',
out='post-alignment-qc/logs/CollectInsertSizeMetics/{sample}_stdout.txt'
shell:
"gatk CollectInsertSizeMetrics -I {input} -O {output.txt} -H {output.pdf} 2> {log.err}"
rule CollectAlignmentSummaryMetrics:
input:
bam = "align/{sample}.bam",
genome= "references/genome/ref_genome.fa"
output:
txt="post-alignment-qc/CollectAlignmentSummaryMetrics/{sample}.txt",
log:
err='post-alignment-qc/logs/CollectAlignmentSummaryMetrics/{sample}_stderr.err',
out='post-alignment-qc/logs/CollectAlignmentSummaryMetrics/{sample}_stdout.txt'
shell:
"gatk CollectAlignmentSummaryMetrics -I {input.bam} -O {output.txt} -R {input.genome} 2> {log.err}"
rule CollectGcBiasMetrics:
input:
bam = "align/{sample}.bam",
genome= "references/genome/ref_genome.fa"
output:
txt="post-alignment-qc/CollectGcBiasMetrics/{sample}_metrics.txt",
CHART="post-alignment-qc/CollectGcBiasMetrics/{sample}_metrics.pdf",
S="post-alignment-qc/CollectGcBiasMetrics/{sample}_summary.txt"
log:
err='post-alignment-qc/logs/CollectGcBiasMetrics/{sample}_stderr.err',
out='post-alignment-qc/logs/CollectGcBiasMetrics/{sample}_stdout.txt'
shell:
"gatk CollectGcBiasMetrics -I {input.bam} -O {output.txt} -R {input.genome} -CHART = {output.CHART} "
"-S {output.S} 2> {log.err}"
The error message says the following:
WildcardError in line 9 of Snakefile:
No values given for wildcard 'sample'.
File "Snakefile", line 9, in <module>
In my code above I have indicated the problem causing line. When I simply remove this line everything runs perfekt. I am really confused, because I pretty much copy and pasted each rule, and this is the only rule that causes any problems.
If someone could point out what I did wrong, I would be very thankful!
Cheers!
Seems like it could be a spelling mistake - in the highlighted line, you write samples=SAMPLES, but the wildcard is called {sample} without the "s".

Retrieve results between two patterns plus one line in sed

I would like to extract all lines between INFO:root:id is and one line after the INFO:root:newId.
Can anyone advise how I can achieve this?
Currently I'm using
sed -n '/INFO:root:id is/,/INFO:root:newId/p' 1/python.log
and I'm trying to figure out how to print one line after the second pattern match.
INFO:root:id is
INFO:root:16836211
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): abc.hh.com
DEBUG:urllib3.connectionpool:https://abc.hh.com:443 "POST /api/v2/import/row.json HTTP/1.1" 201 4310
INFO:root:newId
INFO:root:35047536
INFO:root:id is
INFO:root:46836211
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): abc.hh.com
DEBUG:urllib3.connectionpool:https://abc.hh.com:443 "POST /api/v2/import/row.json HTTP/1.1" 201 4310
INFO:root:newId
INFO:root:55547536
If I am understanding question correctly
$ seq 10 | sed -n '/3/,/5/{/5/N;p;}'
3
4
5
6
/3/ is starting regex and /5/ is ending regex
/5/N get additional line for ending regex
tested on GNU sed, syntax might differ for other versions
With awk
$ seq 10 | awk '/3/{f=1} f; /5/{f=0; if((getline a)>0) print a}'
3
4
5
6
Unclear whether you want only the first set of lines after a match or all matches.
If you want the first set between the matching patterns, it is easy if you use /INFO:root:id/ for your end match as well and then use head -n -1 to print everything but the last line.
$ sed -n '/INFO:root:id is/,/INFO:root:id/p' test.txt | head -n -1
INFO:root:id is
INFO:root:16836211
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): abc.hh.com
DEBUG:urllib3.connectionpool:https://abc.hh.com:443 "POST /api/v2/import/row.json HTTP/1.1" 201 4310
INFO:root:newId
INFO:root:35047536
Just use flags to indicate when you've found the beginning and ending regexps and print accordingly:
$ seq 10 | awk 'e{print buf $0; buf=""; b=e=0} /3/{b=1} b{buf = buf $0 ORS; if (/5/) e=1}'
3
4
5
6
Note that this does not have the potential issue of printing lines when there's only the beginning or ending regexp present but not both. The other answers, including your currently accepted answer, have that problem:
$ seq 10 | sed -n '/3/,/27/{/27/N;p;}'
3
4
5
6
7
8
9
10
$ seq 10 | awk '/3/{f=1} f; /27/{f=0; if((getline a)>0) print a}'
3
4
5
6
7
8
9
10
$ seq 10 | awk 'e{print buf $0; buf=""; b=e=0} /3/{b=1} b{buf = buf $0 ORS; if (/27/) e=1}'
$
Note that the script I posted correctly didn't print anything because a block of text beginning with 3 and ending with 27 was not present in the input.

Sed replace sql

page.sql
Replace the following script
INSERT INTO `page`
by this:
INSERT INTO `page` (page_id, page_namespace, page_title, page_restrictions, page_counter, page_is_redirect, page_is_new, page_random, page_touched, page_latest, page_len)
I am a beginer,I don't know what's wrong with this GNU Sed command on windows:
sed 's/INSERT INTO `page`/INSERT INTO `page` (page_id, page_namespace, page_title, page_restrictions, page_counter, page_is_redirect, page_is_new, page_random, page_touched, page_latest, page_len)/g' pageC.sql > paged.sql
changing the whole line by a new content:
sed '/INSERT INTO `page`/ c\
INSERT INTO `page` (page_id, page_namespace, page_title, page_restrictions, page_counter, page_is_redirect, page_is_new, page_random, page_touched, page_latest, page_len)
' pageC.sql > paged.sql
Substitution of text
sed 's/INSERT INTO `page`/& (page_id, page_namespace, page_title, page_restrictions, page_counter, page_is_redirect, page_is_new, page_random, page_touched, page_latest, page_len)/' pageC.sql > paged.sql
no need of g unless this is a pattern that have several occurence on the SAME line (sed work per line, one at a the time by default)

AWK - filter file with not equal fields

I've been trying to pull a field from a row in a file although each row may have plus or minus 2 or 3 fields per row. They aren't always equal in the number of fields per row.
Here is a snippet:
A orarpp 45286124 1 1 0 20 60 Nov 25 9-16:42:32 01:04:58 11176 117056 0 - oracleXXX (LOCAL=NO)
A orarpp 45351560 1 1 3 20 61 Nov 30 5-03:54:42 02:24:48 4804 110684 0 - ora_w002_XXX
A orarpp 45548236 1 1 22 20 71 Nov 26 8-19:36:28 00:56:18 10628 116508 0 - oracleXXX (LOCAL=NO)
A orarpp 45679190 1 1 0 20 60 Nov 28 6-23:42:20 00:37:59 10232 116112 0 - oracleXXX (LOCAL=NO)
A orarpp 45744808 1 1 0 20 60 10:52:19 23:08:12 00:04:58 11740 117620 0 - oracleXXX (LOCAL=NO)
A root 45810380 1 1 0 -- 39 Nov 25 9-19:54:34 00:00:00 448 448 0 - garbage
In the case of the first line, I'm interested in 9-16:42:32 and the similar fields for each row.
I've tried to pull it by using ':' as the field separator and then filter from there however, what I am trying to accomplish is to do something if the number before the dash (in the example it's 9) is greater than one.
cat file.txt | grep oracle | awk -F: '{print substr($1, length($1)-5)}'
This is because the number of fields on either side of the actual field I need can be different from line to line.
Definitely not the most efficient but I've been trying to do this with an awk one liner.
Hints or a direction would be appreciated to get me moving again. I am not opposed to doing in a better way than awk.
Thanks.
Maybe cut is the right tool for this job? For example, with your snippet:
$ cut -c 62-71 file.txt
9-16:42:32
5-03:54:42
8-19:36:28
6-23:42:20
23:08:12
9-19:54:34
The arguments tell cut to snip columns (-c) 62 through 71.
For additional processing, you can pipe it to awk.
You can also accomplish the whole thing in awk by accepting entire lines and then using substr to extract the columns you want. For example, this awk command produces the same output as the cut command above:
awk '{ print substr($0, 62, 10) }' file.txt
Whether you create a pipeline or do the processing entirely in awk is at least in part a matter of personal taste / style.
Would this do?
awk -F: '/oracle/ {print substr($0,62,10)}' file.txt
9-16:42:32
8-19:36:28
6-23:42:20
23:08:12
This search for oracle and then print 10 characters starting from position 62
You can grab those identifiers with one of
grep -o '[[:digit:]]\+-[[:digit:]]\{2\}:[[:digit:]]\{2\}:[[:digit:]]\{2\}'
grep -oP '\d+-\d\d:\d\d:\d\d' # GNU grep
It sounds like you want to do something with the lines, not just find the ids. Please elaborate.
Using GNU awk:
gawk --re-interval '
/oracle/ && \
match($0, /([[:digit:]]+)-([[:digit:]]{2}:){2}[[:digit:]]{2}/, a) && \
a[1]>1 {
# do something with the matching line
print
}
' file

sed remove line containing a string and nothing but; automation using for loop

Q1: Sed specify the whole line and if the line is nothing but the string then delete
I have a file that contains several of the following numbers:
1 1
3 1
12 1
1 12
25 24
23 24
I want to delete numbers that are the same in each line. For that I have either been using:
sed '/1 1/d' < old.file > new.file
OR
sed -n '/1 1/!p' < old.file > new.file
Here is the main problem. If I search for pattern '1 1' that means I get rid of '1 12' as well. So for I want the pattern to specify the whole line and if it does, to delete it.
Q2: Automation of question 1
I am also trying to automate this problem. The range of numbers in the first column and the second column could be from 1 to 25.
So far this is what I got:
for ((i=1;i<26;i++)); do
sed "/'$i' '$i'/d" < oldfile > newfile; mv newfile oldfile;
done
This does nothing to the oldfile in the end. :(
This would be more readable with awk:
awk '$1 == $2 {next} {print}' oldfile > newfile
Update based on comment:
If the requirement is to remove lines where the two values are within 1 of each other:
awk '{d = $1-$2; if (-1 <= d && d <= 1) next; else print}' oldfile
Unfortunately, awk does not have abs() (at least nawk and gawk don't)
Just put the first number in a group (\([0-9]*\)) and then look for it with a backreference (\1). Since the line to delete should contain only the group, repeated, use the ^ to mark the beginning of line and the $ to mark the end of line. For example, for the following file:
$ cat input
1 1
3 1
12 1
1 12
12 12
12 13
13 13
25 24
23 24
...the result is:
$ sed '/^\([0-9]*\) \1$/d' input
3 1
12 1
1 12
12 13
25 24
23 24
You can also do it with grep:
grep -E -v "([0-9])*\s\1" testfile
Look for multiple digits in a row and remember them, followed by a single whitespace, followed by whatever digits you remembered.