extract lines between same keyword that match a pattern

extract lines between same keyword that match a pattern - sed

Need to match a pattern which is unique across the file but need to print lines between two markers, where a pattern is matched.
My file looks like this.
echo "Start 2A25.20090401.64809.7.HDF 6420 6751"
echo "dimensions 9249 49"
echo "New Cell"
grep "6542,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 20.09 8.07334 74.6131 170 0 6 6
grep "6542,07" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 32.25 8.11139 74.6406 210 3.66764
grep "6543,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 33.28 8.05147 74.6431 210 0.84248
grep "6543,07" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 43.38 8.08952 74.6707 210 20.3994
grep "6543,08" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 24.22 8.12717 74.6979 210 1.21783
grep "6544,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 35.81 8.02963 74.6732 210 6.31353
grep "6544,07" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 41.58 8.06767 74.7007 200 14.5371
grep "6545,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 36.3 8.00776 74.7033 120 6.13395
grep "6545,07" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 31.57 8.0458 74.7308 210 4.22794
grep "6546,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 28.49 7.98589 74.7333 292 2.64533
echo "New Cell"
grep "6562,21" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 26.74 8.19021 75.6125 210 0.61061 9 9
grep "6563,20" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 26.35 8.13187 75.6167 210 1.0852
grep "6563,21" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 42.51 8.16825 75.6426 200 13.5489
grep "6563,22" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 25.82 8.20457 75.6684 210 0.615512
grep "6564,20" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 23.08 8.10994 75.6467 272 0.613962
grep "6564,21" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 46.55 8.14632 75.6726 200 17.1675
grep "6564,22" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 36.89 8.18263 75.6984 200 3.10095
grep "6565,21" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 31.61 8.12436 75.7026 200 2.52639
grep "6565,22" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 28.85 8.16067 75.7284 120 0.945648
echo "New Cell"
I need sed to match pattern and print all the lines in the cell where pattern matched.
For e.g. for "6545,06" as pattern I need all the lines that are between "New Cell" boundaries where the pattern matched, for this pattern need output as
grep "6542,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 20.09 8.07334 74.6131 170 0 6 6
grep "6542,07" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 32.25 8.11139 74.6406 210 3.66764
grep "6543,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 33.28 8.05147 74.6431 210 0.84248
grep "6543,07" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 43.38 8.08952 74.6707 210 20.3994
grep "6543,08" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 24.22 8.12717 74.6979 210 1.21783
grep "6544,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 35.81 8.02963 74.6732 210 6.31353
grep "6544,07" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 41.58 8.06767 74.7007 200 14.5371
grep "6545,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 36.3 8.00776 74.7033 120 6.13395
grep "6545,07" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 31.57 8.0458 74.7308 210 4.22794
grep "6546,06" ../TextFilesDir/out.2A25.20090401.64809.7.HDF.txt.text = 28.49 7.98589 74.7333 292 2.64533
Unfortunately begin and end boundaries are the same.
Would be grateful if I can get a sed script to do this.

I do not have sed right now, but the following regex extracts exactly what you want (if I understood it correctly):
echo "New Cell"\s*(.*?"6542,06".*?)\s*echo "New Cell"
You can extract only the "grep" lines using \1.
Replace the "6542,06" part in the regex for other sub-strings to be found.
I tested the regex here.

This might work for you (GNU sed):
sed '/New Cell/ba;H;$!d;:a;x;/6546,06/s/.//p;z;x;d' file
Gather up lines following a line containing New Cell in the hold space (HS).
If encountering another line containing New Cell or the the end of the file is reached, check the collection for the required string (6546,06 in the example above), and print the collection less the first character which will be an introduced newline.
Regardless of a match, empty the HS and repeat.

Related

Unique count of a value in a zipped file based on other constraints on surrounding lines

I have a log file.
Has data like this:
Operation=ABC,
CustomerId=12,
..
..
..
Counters=qwe=1,wer=2,mbn=4,Hello=0,
----
Operation=CQW,
CustomerId=10,
Time=blah,
..
..
Counters=qwe=1,wer=2,mbn=4,Hello=0,jvnf=2,njfs=4
----
Operation=ABC,
CustomerId=12,
Metric=blah
..
..
Counters=qwe=1,wer=2,mbn=4,Hello=1, uisg=2,vieus=3
----
Operation=ABC,
CustomerId=12,
Metric=blah
..
..
Counters=qwe=1,wer=2,mbn=4,Hello:0, uisg=2,vieus=3
----
Now, I want to find all the unique CustomerIds where Operation=ABC and Hello=0 (in Counters).
All of this info is contained in .gz files in a directory.
So, here is what I've tried to just retrieve the number of times Operation=ABC and "Hello=0" appears in the lines near it.
zgrep -A 20 "Operation=ABC" * | grep "Hello=0" | wc -l
This gave me the number of times that "Hello=0" was found for Operation=ABC. (about 250)
In order to get unique customer Ids, I tried this:
zgrep -A 20 "Operation=ABC" * | grep "Hello=0" -B 10 | grep "CustomerId" | uniq -c
This gave me no results. What am I getting wrong here?

Actually, this works. I was just being impatient.
zgrep -A 20 "Operation=ABC" * | grep "Hello=0" -B 10 | grep "CustomerId" | uniq -c

You need NOT to use these many grep and zgrep we could do it within single awk.
awk -F'=' '
/^--/{
if(val==3){
print value
}
val=value=""
}
/Operation=ABC/{
val++
}
/CustomerId/{
if(!a[$NF]++){
val++
}
}
/Hello=0/{
val++
}
{
value=(value?value ORS:"")$0
}
END{
if(val && value){
print value
}
}' <(gzip -dc input_file.gz)
Output will be as follows(tested from your sample only):
Operation=ABC,
CustomerId=12,
..
..
..
Counters=qwe=1,wer=2,mbn=4,Hello=0,

search and print the value inside tags using script

I have a file like this. abc.txt
<ra><r>12.34</r><e>235</e><a>34.908</a><r>23</r><a>234.09</a><p>234</p><a>23</a></ra>
<hello>sadfaf</hello>
<hi>hiisadf</hi>
<ra><s>asdf</s><qw>345</qw><a>345</a><po>234</po><a>345</a></ra>
What I have to do is I have to find <ra> tag and for inside <ra> tag there is <a> tag whose valeus I have to store the values inside of into some variables which I need to process further. How should I do this.?
values inside tag within tag are:
34.908,234.09,23
345,345

This awk should do:
cat file
<ra><r>12.34</r><e>235</e><a>34.908</a><r>23</r><a>234.09</a><p>234</p><a>23</a></ra><a>12344</a><ra><e>45</e><a>666</a></ra>
<hello>sadfaf</hello>
<hi>no print from this line</hi><a>256</a>
<ra><s>asdf</s><qw>345</qw><a>345</a><po>234</po><a>345</a></ra>
awk -v RS="<" -F">" '/^ra/,/\/ra/ {if (/^a>/) print $2}' file
34.908
234.09
23
666
345
345
It take in care if there are multiple <ra>...</ra> groups in one line.
A small variation:
awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file
34.908
234.09
23
666
345
345
How does it work:
awk -v RS="<" -F">" ' # This sets record separator to < and gives a new line for every <
/^ra/,/\/ra/ { # within the record starting witn "ra" to record ending with "/ra" do
if (/^a>/) # if line starts with an "a" do
print $2}' # print filed 2
To see how changing RS works try:
awk -v RS="<" '$1=$1' file
ra>
r>12.34
/r>
e>235
/e>
a>34.908
/a>
r>23
/r>
a>234.09
/a>
p>234
...
To store it in an variable you can do as BMW suggested:
var=$(awk ...)
var=$(awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file)
echo $var
34.908 234.09 23 666 345 345
echo "$var"
34.908
234.09
23
666
345
345
Since its many values, you can use an array:
array=($(awk -v RS=\< -F\> '/\/ra/ {f=0} f&&/^a/ {print $2} /^ra/ {f=1}' file))
echo ${array[2]}
23
echo ${var2[0]}
34.908
echo ${var2[*]}
34.908 234.09 23 666 345 345

Use gnu grep's Lookahead and Lookbehind Zero-Length Assertions
grep -oP "(?<=<ra>).*?(?=</ra>)" file |grep -Po "(?<=<a>).*?(?=</a>)"
explanation
the first grep will get the content in ra tag. Even there are several ra tags in one line, it still can identified.
The second grep get the content in a tag

subtrating the numbers in first column with awk or sed

I have a text file as shown below. I would like to subtract the numbers in the first column and add a new column with the calculated values(absolute value) to the input files instead of printing the output. How can I do this for multiple files with awk or sed?
46-104 46 3.95073
46-46 46 1.45997
50-50 50 1.51589
52-100 52 4.16567
desired output
46-104 46 3.95073 58
46-46 46 1.45997 0
50-50 50 1.51589 0
52-100 52 4.16567 48

Here's the quick way using awk:
awk '{ split($1,a,"-"); print $0, (a[1]-a[2] >= 0 ? a[1]-a[2] : a[2]-a[1]) | "column -t" }' file
Results:
46-104 46 3.95073 58
46-46 46 1.45997 0
50-50 50 1.51589 0
52-100 52 4.16567 48
For multiple files, assuming you only have files of interest in your present working directory:
for i in *; do awk '{ split($1,a,"-"); print $0, (a[1]-a[2] >= 0 ? a[1]-a[2] : a[2]-a[1]) | "column -t" }' "$i" > "$i.temp" && mv "$i.temp" "$i"; done

With awk:
function abs(value)
{
return (value<0?-value:value);
}
{
split($1, v, "-")
print $0, "\t", abs(v[1]-v[2])
}
Online demo

This might work for you (GNU sed and bc):
sed -r 'h;s/(\S+).*/echo \1|bc/e;H;x;s/\n-?/ /' file
h copy the line to the hold space (HS)
s/(\S+).*/echo \1|bc/e do the maths in the pattern space
H append a newline and the answer to the original line in the HS
x swap PS and HS
s/\n-?/ / substitute the newline and any negative symbol with a space
Along the same lines using awk:
awk '{split($1,a,"-");b=a[1]-a[2];sub(/-/,"",b);print $0,b}' file

How to use sed to output lines containing odd numbers which contain an even digit?

how do you use sed to output only lines which contain odd numbers which themselves contain an even digit assuming that each line only contains a number.
E.G.
seq 1000 | sed ...
Output ends with:
.
.
.
963
965
967
969
981
983
985
987
989

seq 1000 | sed -n '/[24680].*[13579]$/ p'
This is essentially using sed to emulate grep. More direct, then:
seq 1000 | grep '[24680].*[13579]$'

Try:
seq 1000 | sed -ne '/[02468]/ { /[13579]$/ p }'

This might work for you:
seq 1 1000 | sed '/[13579]\>/!d;/[02468]/!d'

Collect numerals at the beginning of the file

I have a text file which contains some numerals, for example,
There are 60 nuts and 35 apples,
but only 24 pears.
I want to collect these numerals (60, 35, 24) at the beginning of the same file, in particular, I want after processing, the file to read
read "60"
read "35"
read "24"
There are 60 nuts and 35 apples,
but only 24 pears.
How could I do this using one of the text manipulating tolls available in *nix?

You can script an ed session to edit the file in place:
{
echo 0a # insert text at the beginning of the file
grep -o '[0-9]\+' nums.txt | sed 's/.*/read "&"/'
echo ""
echo . # end insert mode
echo w # save
echo q # quit
} | ed nums.txt
More succinctly:
printf "%s\n" 0a "$(grep -o '[0-9]\+' nums.txt|sed 's/.*/read "&"/')" "" . w q | ed nums.txt

One way to do it is:
egrep -o [0-9]+ input | sed -re 's/([0-9]+)/read "\1"/' > /tmp/foo
cat input >> /tmp/foo
mv /tmp/foo input

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

extract lines between same keyword that match a pattern - sed

Related

Unique count of a value in a zipped file based on other constraints on surrounding lines

search and print the value inside tags using script

subtrating the numbers in first column with awk or sed

How to use sed to output lines containing odd numbers which contain an even digit?

Collect numerals at the beginning of the file

Categories

Resources