how to write repeat patterns

how to write repeat patterns - sed

A file has list of unique "tags" and "values" separated by tab. I want to repeat the tag according to the given value. Example of input file:
tag value
AAAAA 2
BBBBB 1
CCCCC 3
DDDDD 5
Expected output File
AAAAA
AAAAA
BBBBB
CCCCC
CCCCC
CCCCC
DDDDD
DDDDD
DDDDD
DDDDD
DDDDD
Could you please tell me the awk/sed command. Thanks a lot.

An alternative version for GNU awk:
awk '{while($2--) print $1}'
This is not a good problem to solve with sed. You need to replace a number n with n 1's (for example 3 with 111) and print the word as you consume these 1's.

This awk should do:
awk '{for (i=1;i<=$2;i++) print $1}' file
AAAAA
AAAAA
BBBBB
CCCCC
CCCCC
CCCCC
DDDDD
DDDDD
DDDDD
DDDDD
DDDDD
It loops the number of times found in column #2, then print the word in column #1

Here is a perl alternate:
$ perl -ane 'print "$F[0]\n"x$F[1]' file
AAAAA
AAAAA
BBBBB
CCCCC
CCCCC
CCCCC
DDDDD
DDDDD
DDDDD
DDDDD
DDDDD

This might work for you (GNU sed):
sed -r 's/(\S+)\s+(\S+)/seq \2 | sed c\1/e' file
Split the line into arguments for seq and sed commands and evaluate.

Doing arithmetic in sed is a pain, so I would avoid that. awk and perl are good choices, you can also straightforwardly do it with bash:
while read tag value; do
while ((value--)); do
printf "%s\n" "$tag"
done
done < infile
Or as a one-liner:
while read tag value; do while ((value--)); do printf "%s\n" "$tag"; done; done < infile
Output:
AAAAA
AAAAA
BBBBB
CCCCC
CCCCC
CCCCC
DDDDD
DDDDD
DDDDD
DDDDD
DDDDD

Related

Replace game1 with game001 with sed nested commands

cat 1.txt | sed -E 's,game([0-9]+),game$(printf %03d \1),g'
to replace 1.txt from:
game1 xxx vs yyy
game11 aaa vs bbb
to:
game001 xxx vs yyy
game011 aaa vs bbb
but the result is:
$ echo "game1 xxx vs yyy" | sed -E 's,game([0-9]+),game$(printf %03d \1),g'
game$(printf %03d 1) xxx vs yyy
How to make printf %03d \1 correctly evaluated?

You need to use double quotes when you need substitution
sed -E "s,game([0-9]+),game$(printf %03d \1),g" 1.txt
Edit:
And, I don't think sed can pass value of \1 to external commands. perl can help in this case:
$ cat 1.txt
game1 xxx vs yyy
game11 aaa vs bbb
game21 aaa vs bbb
$ sed -E "s,game([0-9]+),game$(printf %03d \1),g" 1.txt
game001 xxx vs yyy
game001 aaa vs bbb
game001 aaa vs bbb
$ # can also use: perl -pe 's/game\K\d+/sprintf "%03d", $&/ge'
$ perl -pe 's/game([0-9]+)/sprintf "game%03d", $1/ge' 1.txt
game001 xxx vs yyy
game011 aaa vs bbb
game021 aaa vs bbb

You can't combine shell commands and sed backreferences like this (and if you could, you'd have to double quote the sed command, see other answer). The shell would try to evaluate the command before sed sees it, but \1 wouldn't mean anything to the shell.
You can do it as follows, though:
$ sed -E 's/^(game)([[:digit:]]+)/\100\2/;s/^(game).{0,2}([[:digit:]]{3})/\1\2/' 1.txt
game001 xxx vs yyy
game011 aaa vs bbb
The first substitution, s/^(game)([[:digit:]]+)/\100\2/, adds two zeros in front of the digits after game:
$ sed -E 's/game([[:digit:]]+)/game00\1/' 1.txt
game001 xxx vs yyy
game0011 aaa vs bbb
The second substitution, s/^(game).{0,2}([[:digit:]]{3})/\1\2/ removes up to two characters between game and three digits that follow it, to get rid of unwanted extra zeros.
Notice that
I've used / instead of , as delimiter, just because I'm more used to it.
I've anchored game at the start of the line with ^.
I've used one more capture group for game so I don't have to type it twice per command.
I've used the POSIX character class [[:digit:]] instead of [0-9].
I've used sed '<command>' 1.txt instead of cat 1.txt | sed '<command>' to avoid the useless use of cat.

Just use awk:
$ awk '{sub(/game/,""); $1=sprintf("game%03d",$1)} 1' file
game001 xxx vs yyy
game011 aaa vs bbb
or in general to operate on saved capture groups with GNU awk for the 3rd arg to match():
$ awk 'match($0,/(game)([0-9]+)(.*)/,a){ printf "%s%03d%s\n", a[1], a[2], a[3] }' file
game001 xxx vs yyy
game011 aaa vs bbb
With sed you'd need:
$ sed -E 's/(game)([0-9]) /\10\2 /; s/(game)([0-9]{2}) /\10\2 /' file
game001 xxx vs yyy
game011 aaa vs bbb

Using sed selectively to delete lines

I have a text file (say file)
Name
aaa
bbb
ccc
Name
xxxx
Name
yyyy
tttt
I want to remove "Name" from the file except if it occurs in the header. I know sed removes lines, but if I do
sed '/Name/d' file
it removes all "Name".
Desired ouput:
Name
aaa
bbb
ccc
xxxx
yyyy
tttt
Can you suggest what options I should use?

Use this:
sed '1!{/Name/d}' file
The previous command applies to all lines except of the first line.

If you know that the first header is on the first line, skip it like this:
sed '1!{/Name/d}' infile
That means the pattern should apply on all lines except line 1.
Or the other way around:
sed -n '2,${/Name/d};p' infile
Perhaps with awk:
awk '/Name/ && c++ == 0 || !/Name/' infile
Output in all cases:
Name
aaa
bbb
ccc
xxxx
yyyy
tttt

You might find the awk syntax more intuitive:
awk 'NR==1 || !/Name/' file
the above just says if it's line number 1 or the line doesn't include "Name" then print it

sed - What's the difference here?

test.txt file contains:
AAAAA
BBBBB
CCCCC
or in hex:
41 41 41 41 41 0A 42 42 42 42 42 0A 43 43 43 43 43 0A
If I run:
sed s/A/B/g test.txt
it returns:
BBBBB
BBBBB
CCCCC
Likewise:
sed 's/\x41/B/g' test.txt
returns:
BBBBB
BBBBB
CCCCC
but if I run:
sed 's/\x0A/B/g' test.txt
it still returns:
AAAAA
BBBBB
CCCCC
Why?

sed works on one line at a time. For each line of the file, sed puts it on pattern-space by removing the new line (\n) from the line and does some action. Once the action is done, it places the new line back to the line and prints it out by default and reads the next line into pattern-space (unless forced not to by using -n option). This continues until the end of file is reached.
For your attempt, when sed reads the first line, it has already removed the new line from the line, hence your substitution is basically a no-op. Once that is done, it puts the new line back to your first line, prints it and reads the second line into pattern space and continues.
To get your desired output, you will have to read the entire file in to pattern space, with each line separated by a new line character.
You can do so by saying:
$ sed ':a;N;s/\x0A/B/;ba' file
AAAAABBBBBBBCCCCC
:a creates a label
N appends the next line into pattern spaces separated by a new line so your pattern spaces no contains line1\nline2.
s/\x0A/B/ is removing the \n from your pattern space and replaces it with B.
ba tells the sed to go back to label :a and repeat the process.
In the second run sed again appends the next line in to pattern space. Now your pattern spaces looks like line1Bline2\nline3. When the substitution occurs, you are left with your desired output.

removing text before and after ()

How do I remove the text till the first ( and after )?
INSERT INTO `todel` VALUES (1,'akbar\'s','Mumbai, Delhi road, India');
INSERT INTO `todel` VALUES (2,'amar\"s','South Africa, ghana');
The expected output is like this...
1,'akbar\'s','Mumbai, Delhi road, India'
2,'amar\"s','South Africa, ghana'

Ruby(1.9+)
$> ruby -ne 'print $_.sub(/.*\(|\).*$/,"")' file
1,'akbar\'s','Mumbai, Delhi road, India'
2,'amar\"s','South Africa, ghana'
or the shell(bash)
$> while read -r line; do line=${line#*(}; echo ${line%)*}; done <file
1,'akbar\'s','Mumbai, Delhi road, India'
2,'amar\"s','South Africa, ghana'
or awk
$> awk '{sub(/.*\(/,"");sub(/\).*/,"")}1' file
1,'akbar\'s','Mumbai, Delhi road, India'
2,'amar\"s','South Africa, ghana'
or sed
$> sed -rn 's/.*\(//;s/\).*//p' file
1,'akbar\'s','Mumbai, Delhi road, India'
2,'amar\"s','South Africa, ghana'

awk can take a regular expression as field separator, so use either parenthesis as the field separator and just emit the 2nd field:
awk -F'[()]' '{print $2}' filename

You can remove everything from beginning of line until the first ( and from (including) the last ) till the end of line with sed:
sed -r 's/^[^(]*\(.*)\)[^)]*$/\1/'

difference between the content of two files

I have two files one file subset of other and i want to obtain a file which has contents not common to both.for example
File1
apple
mango
banana
orange
jackfruit
cherry
grapes
eggplant
okra
cabbage
File2
apple
banana
cherry
eggplant
cabbage
The resultant file, difference of above two files
mango
orange
jackfruit
grapes
okra
Any ideas on this are appreciated.

You can sort the files then use comm:
$ comm -23 <(sort file1.txt) <(sort file2.txt)
grapes
jackfruit
mango
okra
orange
You might also want to use comm -3 instead of comm -23:
-1 suppress lines unique to FILE1
-2 suppress lines unique to FILE2
-3 suppress lines that appear in both files

1 Only one instance , in either
cat File1 File2 | sort | uniq -u
2 Only in first file
cat File1 File2 File2 | sort | uniq -u
3 Only in second file
cat File1 File1 File2 | sort | uniq -u

use awk, no sorting necessary (reduce overheads)
$ awk 'FNR==NR{f[$1];next}(!($1 in f)) ' file2 file
mango
orange
jackfruit
grapes
okra

1. Files uncommon to both files
diff --changed-group-format="%<" --unchanged-group-format="%>" file1 file2
2. File unique to first file
diff --changed-group-format="%<" --unchanged-group-format="" file1 file2
3. File unique to second file
diff --changed-group-format="" --unchanged-group-format="%>" file1 file2
Hope it works for you

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to write repeat patterns - sed

An alternative version for GNU awk: awk '{while($2--) print $1}' This is not a good problem to solve with sed. You need to replace a number n with n 1's (for example 3 with 111) and print the word as you consume these 1's.

This awk should do: awk '{for (i=1;i<=$2;i++) print $1}' file AAAAA AAAAA BBBBB CCCCC CCCCC CCCCC DDDDD DDDDD DDDDD DDDDD DDDDD It loops the number of times found in column #2, then print the word in column #1

Here is a perl alternate: $ perl -ane 'print "$F[0]\n"x$F[1]' file AAAAA AAAAA BBBBB CCCCC CCCCC CCCCC DDDDD DDDDD DDDDD DDDDD DDDDD

This might work for you (GNU sed): sed -r 's/(\S+)\s+(\S+)/seq \2 | sed c\1/e' file Split the line into arguments for seq and sed commands and evaluate.

Related

Replace game1 with game001 with sed nested commands

Using sed selectively to delete lines

sed - What's the difference here?

removing text before and after ()

difference between the content of two files

Categories

Resources