SED replace few first occurences ( and ranges ) of pattern - sed

is this possible to change first 4 ( or more ) occurences of string in this scenario using SED (opposite of sed -r 's/[^[:space:]]*/TEST/4g'):
TEST TEST TEST TEST five six seven
I get it working with reversing words order in line using AWK twice, but this is long, complex and I want to get this with just SED:
echo one two three four five six seven | awk '{for(i=NF;i>=1;i--) printf "%s ", $i;print ""}' | sed -r 's/[^ ]*/TEST/4g' | awk '{for(i=NF;i>=1;i--) printf "%s ", $i;print ""}'
Also maybe there is option to change ranges of occurence like 3-5, 6-12, ...?
Example input is:
one two three four five six seven
eight nine ten eleven twelve thirteen fourteen
fifteen sixteen seventeen eighteen nineteen twenty twenty-one

What about a single AWK:
awk '{for(i=1;i<=NF;i++) if(i<5){$i="TEST"}; print}'
Test run:
$ echo one two three four five six seven | awk '{for(i=1;i<=NF;i++) if(i<5){$i="TEST"}; print}'
TEST TEST TEST TEST five six seven
This solution is short, readable and maintainable. If it does not satisfy you, please add some details about your specific problem.
Perl equivalent solution:
perl -pe 's/\S+/$i++<4?"TEST":$&/ge'
Test run:
$ echo one two three four five six seven | perl -pe 's/\S+/$i++<4?"TEST":$&/ge'
TEST TEST TEST TEST five six seven
maybe there is option to change ranges of occurence like 3-5, 6-12
AWK:
awk '{for(i=3;i<6;i++)$i="TEST";print}'
Test run on the newly provided input file:
$ awk '{for(i=3;i<6;i++)$i="TEST";print}' input
one two TEST TEST TEST six seven
eight nine TEST TEST TEST thirteen fourteen
fifteen sixteen TEST TEST TEST twenty twenty-one
Perl:
perl -pe 's/\S+/++$c~~[3..5]?"TEST":$&/ge'
Test run on the newly provided input file:
$ perl -pe '$c=0;s/\S+/++$c~~[3..5]?"TEST":$&/ge' input
Smartmatch is experimental at -e line 1. <== This is a warning that goes to STDERR
one two TEST TEST TEST six seven
eight nine TEST TEST TEST thirteen fourteen
fifteen sixteen TEST TEST TEST twenty twenty-one

The answer has been provided here by mikeserv. NOTE: if you want to process a range, you need to use the maximum bound, as it will process as many matches as it can without throwing any exceptions/errors.
GNU sed:
echo 'one two three four five six seven' | \
sed 's/[^[:space:]]*/\n&/g;:t;/\n/{x;/.\{4\}/!{s/$/./;x;s/\n[^[:space:]]*/TEST/;bt};x};s/\n//g'
POSIX sed:
nl='
';
echo 'one two three four five six seven' | sed "s/[^[:space:]]*/\\$nl&/g;:t${nl}/\n/{x;/.\{4\}/!{${nl}s/$/./;x;s/\n[^[:space:]]*/TEST/;bt$nl};x$nl};s/\n//g"
See the online sed demo.
Original explanation (note that here, 1 is replaced with 2, you may use any other patterns):
There I use two notable techniques. In the first place every
occurrence of 1 on a line is replaced with \n1. In this way, as I
do the recursive replacements next, I can be sure not to replace the
occurrence twice if my replacement string contains my replace
string. For example, if I replace he with hey it will still work.
I do this like:
s/1/\
&/g
Secondly, I am counting the replacements by adding a character to
hold space for each occurrence. Once I reach three no more occur. If
you apply this to your data and change the \{3\} to the total
replacements you desire and the /\n1/ addresses to whatever you mean
to replace, you should replace only as many as you wish.

This is a completely inappropriate task for sed as sed is for doing simple s/old/new/ on individual strings, that is all. With any awk in any shell on every UNIX box:
$ echo one two three four five six seven | awk '{for (i=1; i<=4; i++) $i="TEST"}1'
TEST TEST TEST TEST five six seven
$ echo one two three four five six seven | awk '{for (i=3; i<=5; i++) $i="TEST"}1'
one two TEST TEST TEST six seven
and if you need to parameterize it:
echo one two three four five six seven |
awk -v beg=3 -v end=5 '{for (i=beg; i<=end; i++) $i="TEST"}1'
one two TEST TEST TEST six seven

$ echo "one two three four fix six" | \
sed -E ':r s/(^|(TEST )+)[^ ]*/\1TEST/;/^(TEST ){4}/!br'
TEST TEST TEST TEST fix six
Explanation:
:r label named r to branch back to
s/(^|(TEST )+)[^ ]*/\1TEST/; replacement that replaces just one occurrence of a non-TEST word, preceeded by either the start of the line or 1 or more TESTs
/^(TEST ){4}/!br' regex for what's wanted, followed by the !br to branch back to :r if it's not matched yet.
Clearly this is fragile. It will loop infinitely if any lines don't have four words. Might be GNU sed only.

Related

How to change the first occurrence of a line containing a pattern?

I need to find the line with first occurrence of a pattern, then I need to replace the whole line with a completely new one.
I found this command that replaces the first occurrence of a pattern, but not the whole line:
sed -e "0,/something/ s//other-thing/" <in.txt >out.txt
If in.txt is
one two three
four something
five six
something seven
As a result I get in out.txt:
one two three
four other-thing
five six
something seven
However, when I try to modify this code to replace the whole line, as follows:
sed -e "0,/something/ c\COMPLETE NEW LINE" <in.txt >out.txt
This is what I get in out.txt:
COMPLETE NEW LINE
five six
something seven
Do you have any idea why the first line is lost?
The c\ command deletes all lines between and inclusive the first matching address through the second matching address, when used with 2 addresses, and prints out the text specified following the c\ upon matching the second address. If there is no line matching the second address in the input, it just deletes all lines (inclusively) between the first matching address through the last line. Since you want to replace one line only, you shouldn't use the c\ command on an address range. The c\ is immediately followed by a new-line character in normal usage.
The 0,/regexp/ address range is a GNU sed extension, which will try to match regexp in the first input line too, which is different from 1,/regexp/ in that aspect. So, the correct command in GNU sed could be
sed '0,/something/{/something/c\
COMPLETE NEW LINE
}' < in.txt
or simplified as pointed out by Sundeep
sed '0,/something/{//c\
COMPLETE NEW LINE
}' < in.txt
or a one-liner,
sed -e '0,/something/{//cCOMPLETE NEW LINE' -e '}' < in.txt
if a literal new-line character is not desirable.
This one-liner also works as pointed out by potong:
sed '0,/something/!b;//cCOMPLETE NEW LINE' in.txt
This might work for you (GNU sed):
sed '1!b;:a;/something/!{n;ba};cCOMPLETE NEW LINE' file
Set up a loop that will only operate from the first line.
Within in the loop, if the key word is not found in the current line, print the current line, fetch the next and repeat until the end of the file or a match is found.
When a match is found, change the contents of the current line to the required result.
N.B. The c command terminates any further processing of sed commands in the same way the d command does.
If there are lines in the input following the key word match, the negation of address at the start of the sed cycle will capture these lines and result in their printing and no further processing.
An alternative:
sed 'x;/./{x;b};x;/something/h;//cCOMPLETE NEW LINE' file
Or (specific to GNU and bash):
sed $'0,/something/{//cCOMPLETE NEW LINE\n}' file
Just use awk:
$ awk '!done && sub(/something/,"other-thing"){done=1} {print}' file
one two three
four other-thing
five six
something seven
$ awk '!done && sub(/.*something.*/,"other-thing"){done=1} {print}' file
one two three
other-thing
five six
something seven
$ awk '!done && /something/{$0="other-thing"; done=1} {print}' file
one two three
other-thing
five six
something seven
and look what you can trivially do if you want to replace the Nth occurrence of something:
$ awk -v n=1 '/something/ && (++cnt == n){$0="other-thing"} {print}' file
one two three
other-thing
five six
something seven
$ awk -v n=2 '/something/ && (++cnt == n){$0="other-thing"} {print}' file
one two three
four something
five six
other-thing

Using Sed to Delete multiple lines using a file with patterns

I am currently using sed to delete lines and subsequent line with various patterns from a file using the following the following code:
sed -i -e"/String1/,+1d" -e"/String2/,+1d," filename.txt
Works very well however I have a lot of patterns which vary from time to time.
Is it possible to put all patterns in another text file and make sed to delete all entries for patterns found in such file ?
Thanks
Here is an awk version
awk 'NR==FNR {a[$0]++;next} {for (i in a) if ($0~i) f=2} --f<0' list yourfile
NR==FNR {a[$0]++;next} store the list of lines to remove for file list in array a
for (i in a) for every line, loop through all lines in list
if ($0~i) f=2 if trigger line is found, set flag f to 2
--f<0 decrease flag f by one and test if it less than 0, if yes, print the line.
example
cat yourfile
one
two
three
four
five
six
seven
eight
nine
ten
eleven
cat list
three
eight
awk 'NR==FNR {a[$0]++;next} {for (i in a) if ($0~i) f=2} --f<0' list yourfile
one
two
five
six
seven
ten
eleven
Trying to stick with sed - at all cost, and being creative :-)
Consider using sed itself to generate the sed script that will perform the substitutions, based on the patterns file.
Important to note that this is solution will process each input file with one-pass, making it possible to use on large files/many patterns.
Proposed Solution:
sed -i -e "$(sed -e '/\//d;s/^/\//;s/$/\/,+1d/' < patterns.txt)" filename.txt
The embedded sed program (sed -e '/\//d;s/^/\//;s/$/\/,+1d/ ...) will convert the patterns.txt to a small sed script:
pattern.txt:
three
eight
foo/bar
Output: (noticed foo/bar ignored - contains '/')
/three/,+1d
/eight/,+1d
Notes, Limitations, etc:
One limit (of above implementation) is the delimiter, code remove any pattern with '/' to simplify generation of sed script, and to avoid potential injection. Possible to work around this limitation and allow for alternate delimiter (by escaping special characters in the pattern, or leveraging the '\%' addresses). May need additional testing.
Code assumes that the patterns are valid RE.

How to replace every 2nd tab character with a newline character using sed

given the input
123\t456\tabc\tdef
create the output
123\t456\nabc\tdef
which would display like
123 456
abc def
Note that it needs to work across multiple lines, not just two.
EDIT
a better example might help clarify.
input (there is only expected to be 1 line of input)
1\t2\t3\t4\t5\t6\t7\t8
expected output
1 2
3 4
5 6
7 8
...
With GNU sed:
sed 's/\t/\n/2;P;D;' file
Replaces second occurrence of tab character with newline character.
This little trick should work:
sed 's/\(\t[^\t]*\)\t/\1\n/g' < input_file.txt
EDIT:
Below is an example:
$ cat 1.txt
one two three four five six seven
five six seven
$ sed 's/\(\t[^\t]*\)\t/\1\n/g' < 1.txt
one two
three four
five six
seven
five six
seven
$
EDIT2:
For MacOS' standard sed try this:
$ sed $'s/(\t[^\t]*\t/\\1\\\n/g' < 1.txt
$ is used for replacing escape characters on the bash-level.
Let's say following is the Input_file:
cat Input_file
123 456 abc def
Then to get them into 2 columns following may help you in same.
xargs -n2 < Input_file
Output will be as follows.
123 456
abc def

How do I replace lines between two patterns with a single line in sed?

This is my input file:
one
two
three
four
five
six
seven
eight
nine
ten
I want to turn the file into
one
two
three
NEW LINE
eight
nine
ten
with sed. That is, I want to replace the lines from /four/ (including) to /seven/ (including) with the single line NEW LINE.
I can do that with
sed '/four/aNEW LINE
/four/,/seven/d' file.txt
But I am wondering if there is a simpler way, notably one without having to repeat a pattern (as I needed to with /four/).
Edit As per fedorquis comment-question, this can also be in awk (although for "academic" purposes I'd be interested in sed solutions.)
Edit 2 Unfortunately, the input file suggests that there is a logical order of words in the input file (one followed by two followed by three etc). In my "real world" problem, this is not the case, however. I have no idea how many lines the file has, nor what is preceeded or followed by the lines four and seven. The onl thing I know is that there is a line four which is (not necessarily immediately) followed by a line seven. I am sorry for not stating this clearly when I asked the question, especially because fedorqui has put so much effort in his answer.
Perl is pretty concise, and you don't need to repeat any keywords:
perl -00 -pe 's/four.*seven/NEW_LINE/s'
Here is how you do in sed:
$ sed ':a;N;s/four.*seven/NEW LINE/;ba' file
one
two
three
NEW LINE
eight
nine
ten
Logic is pretty much similar to Glenn's answer. Slurp the entire file in to one long line separated by newlines and substitute everything from four to seven and replace it with NEW LINE.
With sed, you can delete from line four to seven and append after seven. Which is in fact what you posted in your question :)
$ sed -e '/seven/a \NEW LINE' -e '/four/,/seven/d' file
one
two
three
NEW LINE
eight
nine
ten
With awk you can do:
$ awk '/four/ {f=1} !f; /seven/ {print "NEW LINE"; f=0}' file
one
two
three
NEW LINE
eight
nine
ten
What it does is to keep updating the flag f that stops the printing.
When "four" is found, the flag is activated.
When "seven" is found, the flag is deactivated, printing also the NEW LINE.
This might work for you (GNU sed & bash):
sed $'/^four/{:a;N;/^seven/McNEWLINE\nba}' file

regular expression in sed for masking credit card

We need to mask credit card numbers.Masking all but last 4 digits. I am trying to use SED. As credit card number length varies from 12 digits to 19,I am trying to write regular expression.Following code will receive the String. If it contains String of the form "CARD_NUMBER=3737291039299199", it will mask first 12 digits.
Problem is how to write regular expression for credit card-12 to 19 digits long? If I write another expression for 12 digits, it doesn't work.that means for 12 digit credit card- first 8 digits should be masked. for 15 digit credit card, first 11 digits should be masked.
while read data; do
var1=${#data}
echo "Length is "$var1
echo $data | sed -e "s/CARD_NUMBER=\[[[:digit:]]\{12}/CARD_NUMBER=\[\*\*\*\*\*\*\*\*/g"
done
How about
sed -e :a -e "s/[0-9]\([0-9]\{4\}\)/\*\1/;ta"
(This works in my shell, but you may have to add or remove a backslash or two.) The idea is to replace a digit followed by four digits with a star followed by the four digits, and repeat this until it no longer triggers.
This does it in one sed command without an embedded newline:
sed -r 'h;s/.*([0-9]{4})/\1/;x;s/CARD_NUMBER=([0-9]*)([0-9]{4})/\1/;s/./*/g;G;s/\n//'
If your sed doesn't have -r:
sed 'h;s/.*\([0-9]\{4\}\)/\1/;x;s/CARD_NUMBER=\([0-9]*\)\([0-9]\{4\}\)/\1/;s/./*/g;G;s/\n//'
If your sed needs -e:
sed -e 'h' -e 's/.*\([0-9]\{4\}\)/\1/' -e 'x' -e 's/CARD_NUMBER=\([0-9]*\)\([0-9]\{4\}\)/\1/' -e 's/./*/g' -e 'G' -e 's/\n//'
Here's what it's doing:
duplicate the number so it's in pattern space and hold space
grab the last four digits
swap them into hold space and the whole number into pattern space
grap all but the last four digits
replace each digit with a mask character
append the last four digits from hold space to the end of the masked digits in pattern space (a newline comes along for free)
get rid of the newline
try this, you don't have to create complicated regex
var1="CARD_NUMBER=3737291039299199"
IFS="="
set -- $var1
cardnumber=$2
echo $cardnumber | awk 'BEGIN{OFS=FS=""}{for(i=1;i<=NF-4 ;i++){ $i="*"} }1'
output
$ ./shell.sh
************9199
I'm not much of a sed guru, and thus I cannot manage to do it in only one command, though there surely are ways. But with two sed commands, here is what I got:
sed -e 's/CARD_NUMBER=\([0-9]*\)\([0-9]\{4\}\)/\1\
\2/' | sed -e '1s/./x/g ; N ; s/\n//'
Please note the embedded newline.
Because sed works by lines, I first break the card number into the initial part and the last four digits, separating them by a newline (the first sed command). Then, I mask the initial part (1s/./x/g), and remove the new line (N ; s/\n//).
Good luck!