Sed command help to summarize similar log messages

Sed command help to summarize similar log messages - sed

I'm trying to craft a log file summarisation tool for an application that creates a lot of duplicate entries with only a different suffix to indicate point of execution.
Here's a genericized version:
A text_file (infile_grocery.txt) with these contents.
milk skim fruit apple banana
milk skim fruit orange
milk skim fruit mango
milk skim fruit pomegranate
milk 2 percent fruit cherry tomato
milk 2 percent fruit peach
milk whole fruit pineapple
milk skim fruit strawberry raspberry
milk skim fruit strawberry rhubarb
milk whole fruit pineapple
What I'm hoping to get is:
milk skim fruit apple banana, orange, mango, pomegranate
milk 2 percent fruit cherry tomato, peach
milk whole fruit pineapple
milk skim fruit strawberry raspberry, strawberry rhubarb
milk whole fruit pineapple
The command line I've currently cooked up is:
sed -rn "{H;x;s|^(.+) fruit ([^\n]+)\n(.*)\1 fruit (.+)$|\1 fruit \2, \4|;x}; ${x;s/^\n//;p}" infile_grocery.txt
But the results I'm getting are:
milk skim fruit apple banana, mango, strawberry raspberry
milk skim fruit strawberry rhubarb
milk whole fruit pineapple
I'm discarding input somehow. Any gurus with a better idea how to structure this?

This is a awk solution.
awk -F fruit '
$1==x{
printf ",%s", $2
next
}
{
x=$1
printf "\n%s", $0
}
END {
print ""
}' input.txt
Output
milk skim fruit apple banana, orange, mango, pomegranate
milk 2 percent fruit cherry tomato, peach
milk whole fruit pineapple
milk skim fruit strawberry raspberry, strawberry rhubarb
milk whole fruit pineapple

This might work for you (GNU sed):
sed ':a;$!N;s/^\(\(.*fruit\).*\)\n\2\(.*\)/\1,\3/;ta;P;D' file
Explanation:
:a is a place holder for a loop
$!N append a newline followed by the next line except on the last line.
s/^\(\(.*fruit\).*\)\n\2\(.*\)/\1,\3/ collect everything upto the newline into back reference 1 (aka \1). Within this collect everything from the beginning of the line upto and including the word fruit into back reference 2 (aka \2). Collect everything following the matching \2 into back reference 3 (aka \3). Replace this regexp with back reference 1, followed by a comma, a space and then back reference 3.
ta if the substitution was true loop to place holder :a
P if the substitution was false print upto and including the first newline in the pattern space.
D if the substitution was false delete upto and including the first newline in the pattern space.

opref=""
nline=""
while read line; do
pref=`echo $line | sed 's/\(.*fruit\).*/\1/'`
item=`echo $line | sed 's/.*fruit\s\(.*\)/\1/'`
if [ "$opref" == "$pref" ]; then
nline="$nline, $item"
else
[ "$nline" != "" ] && echo $nline
nline=$line
fi
opref=$pref
done < input_file

Related

Sed Pattern Match then Append to Line

I have some lines down below and I'm trying to append "Check" to the line that starts with Apples. Does someone know how I can get "Check" on the same line as Apples, not a new one and print the output? I wasn't getting anywhere on my own.
Thanks
What I have:
Grocery store bank and hardware store
Apples Bananas Milk
What I want:
Grocery store bank and hardware store
Apples Bananas Milk Check
What I tried:
sed -i '/^Apples/a Check' file
What I got:
Grocery store bank and hardware store
Apples Bananas Milk
Check

This might work for you (GNU sed):
sed '/Apples/s/$/ check/' file
If a line contains Apples append the string check. Where $ represents an anchor that is the end of the line (see here).

The problem is that you append the line with a command, see this reference:
The "a" command appends a line after the range or pattern.
What you want is a mere substitution. However, there may be some more tweaks you would like to implement, here are some suggestions:
sed -i 's/Apples/& Check/g' file # Adds ' Check' after each 'Apples'
sed -i 's/\<Apples\>/& Check/g' file # Only adds ' Check' after 'Apples' as whole word
sed -i -E 's/\<Apples(\s+Check)?\>/& Check/g' file # Adds ' Check' after removing existing ' Check'
Note these suggestions are for GNU sed only. \< and \> in GNU sed patterns are word boundaries, \s+ matches one or more whitespaces in GNU sed POSIX ERE patterns, and -E enables the POSIX ERE pattern syntax.
See the online demo:
#!/bin/bash
s='Grocery store bank and hardware store
Apples Bananas Milk'
sed 's/Apples/& Check/g' <<< "$s"
sed 's/\<Apples\>/& Check/g' <<< "$s"
sed -E 's/\<Apples(\s+Check)?\>/& Check/g' <<< "$s"
Output in each case is:
Grocery store bank and hardware store
Apples Check Bananas Milk

Using sed
$ sed '/^Apples/s/.*/& Check/' input_file
Grocery store bank and hardware store
Apples Bananas Milk Check
You can match lines that begin with Apples, return it with & appending Check

sed - conditional replacement of similar string patterns

I try to replace/append strings in text files with sed.
Another string ("green") needs to be matched before a replacement is allowed.
The strings are very similar:
apple >> apple_tree
apple X1 >> apple_tree
appleX1 >> apple_tree
This code works if the condition is not taken into account and the first occurrence of one of the strings should be replaced:
find . -type f -exec sed -i '0,/apple|apple X1|appleX1/{s/apple|apple X1|appleX1/apple_tree/}' {} +
This code can not be executed (prompt is waiting)
find . -type f -exec sed -i '/green/,// /apple|apple A0|appleA0/{s/apple|apple A0|appleA0/apple_tree/}' {} +
Although this one works without alternative patterns:
find . -type f -exec sed -i '/green/,// s/apple/apple_tree/' {} +
Unfortunately, using word border indicators like \<apple\> does not work either.
The OS is Ubuntu 20.04. The solution is not restricted to sed.
Thank you.
Edit:
Input:
orange
pear
kiwi
apple X1
mango
banana
apple
green
orange
pear
kiwi
apple X1
mango
appleX1
banana
apple
Code execution
Desired output:
orange
pear
kiwi
apple X1
mango
banana
apple
green
orange
pear
kiwi
apple_tree
mango
apple_tree
banana
apple_tree

You seem to want:
sed -i -e '/green/,$s/apple\( A0\|A0\|\)/apple_tree/'
Check out https://www.gnu.org/software/sed/manual/sed.html https://www.grymoire.com/Unix/Sed.html https://regexcrossword.com/

sed `D` with address range

As explained in manual, D deletes a portion of the pattern space, up to the first embedded newline. But I can not find any doc explain D combined with address ranges. For example:
$ cat /tmp/test
accident if I use one.
My wife won't let me buy a power saw. She is afraid of an
$ cat /tmp/test | sed -ne '$p;:a;N;$!{ba};2,$D'
accident if I use one.
My wife won't let me buy a power saw. She is afraid of an
Looks like if there are 2 or more lines in pattern space, the first portion till the first embedded newline will be deleted.
Any official doc for it ?
And why 2D does not work at all?
$ cat /tmp/test | sed -ne '$p;:a;N;$!{ba};2D'
Nothing will be shown for above cmd.

Replace first occurrence of a pattern if not preceded with another pattern

Using GNU sed, I try to replace first occurrence of pattern in file, but I don't want to replace if there is another pattern before the match.
For example, if the file contains line with "bird [number]" I want to replace the number with "0" if this pattern has no "cat" word any where before.
Example text
dog cat - fish bird 123
dog fish - bird 1234567
dog - cat fish, lion bird 3456
Expected result:
dog cat - fish bird 123
dog fish - bird 0
dog - cat fish, lion bird 3456
I try to combine How to use sed to replace only the first occurrence in a file? and Sed regex and substring negation solutions and came up with something like
sed -E '0,/cat.*bird +[0-9]+/b;/(bird +)[0-9]+/ s//\10/'
where 0,/cat.*bird +[0-9]+/b;/(bird +)[0-9]+/ should match the first occurrence of (bird +)[0-9]+ if the cat.*bird +[0-9]+ pattern does not match, but I get
dog cat - fish bird 123
dog fish - bird 0
dog - cat fish, lion bird 0
The third line is also changed. How can I prevent it? I think it is related to address ranges, but I do not get it how to negate the second part of the address range.

This might work for you (GNU sed):
sed '/\<cat\>.*\<bird\>/b;s/\<\(bird\) \+[0-9]\+/\1 0/;T;:a;n;ba' file
If a line contains the word cat before the word bird end processing for that line.
Try to substitute the number following the word bird by zero. If not successful end processing for that line. Otherwise read/print all following lines until the end of the file.
Might also be written:
sed -E '/cat.*bird/b;/(bird +)[0-9]+/{s//\10/;:a;n;ba}' file

sed is for doing simple s/old/new replacements, that is all. For anything else just use awk, e.g. with GNU awk instead of the GNU sed you were using:
$ awk 'match($0,/(.*bird\s+)[0-9]+(.*)/,a) && (a[1] !~ /cat/) {$0=a[1] 0 a[2]} 1' file
dog cat - fish bird 123
dog fish - bird 0
dog - cat fish, lion bird 3456

sed: replace only in part of string

I have a simple playlist of song files:
1003 James Brown - The Boss Unknown Artist.mp3
1004 James Brown - Slaughters Theme Unknown Artist.mp3
1005 James Brown - Payback(1) Unknown Artist.mp3
...
I would like them in the following format:
1003 James_Brown_-_The_Boss_Unknown_Artist.mp3
1004 James_Brown_-_Slaughters_Theme_Unknown_Artist.mp3
...
Notice that the whitespace behind the number in front is NOT replaced. I have the following simple sed script:
sed "s/ /_/g"
but that replaces also the space after the number. I know how to form capture groups, but that will not help either. How can I convince sed to only apply the replacement to a portion of the input string, rather than the whole string?

You could do
sed 's/ /_/g; s/_/ /'
I.e. first turn all spaces into underscores, then turn the first underscore back into a space.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Sed command help to summarize similar log messages - sed

opref="" nline="" while read line; do pref=`echo $line | sed 's/\(.fruit\)./\1/'` item=`echo $line | sed 's/.fruit\s\(.\)/\1/'` if [ "$opref" == "$pref" ]; then nline="$nline, $item" else [ "$nline" != "" ] && echo $nline nline=$line fi opref=$pref done < input_file

Related

Sed Pattern Match then Append to Line

sed - conditional replacement of similar string patterns

sed `D` with address range

Replace first occurrence of a pattern if not preceded with another pattern

sed: replace only in part of string

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Sed command help to summarize similar log messages - sed

opref="" nline="" while read line; do pref=`echo $line | sed 's/\(.*fruit\).*/\1/'` item=`echo $line | sed 's/.*fruit\s\(.*\)/\1/'` if [ "$opref" == "$pref" ]; then nline="$nline, $item" else [ "$nline" != "" ] && echo $nline nline=$line fi opref=$pref done < input_file

Related

Sed Pattern Match then Append to Line

sed - conditional replacement of similar string patterns

sed `D` with address range

Replace first occurrence of a pattern if not preceded with another pattern

sed: replace only in part of string

Categories

Resources

opref="" nline="" while read line; do pref=`echo $line | sed 's/\(.fruit\)./\1/'` item=`echo $line | sed 's/.fruit\s\(.\)/\1/'` if [ "$opref" == "$pref" ]; then nline="$nline, $item" else [ "$nline" != "" ] && echo $nline nline=$line fi opref=$pref done < input_file