Replace duplicate character after first pattern - sed

With this data
aaa:
- bbbb: ccc ddd' {eeee [fff
kkk: mmm nnn" oo pp
I like to have removed all duplicate spaces after the first :
aaa:
- bbbb: ccc ddd' {eeee [fff
kkk: mmm nnn" oo pp
Using \b doesn't help here.

With perl (assuming there's only one : per line):
$ perl -pe 's/ +(?!.*:)/ /g' ip.txt
aaa:
- bbbb: ccc ddd' {eeee [fff
kkk: mmm nnn" oo pp
If you can have multiple : but you still want to squeeze multiple spaces after the first :, you can use this:
perl -pe 's/^[^:]+(*SKIP)(*F)| +/ /g'
With sed:
$ sed -E ':a s/^([^:]+:.*) {2,}/\1 /; ta' ip.txt
aaa:
- bbbb: ccc ddd' {eeee [fff
kkk: mmm nnn" oo pp
:a is a label for the substitute command. As long as a match is found, ta will jump to the label, thus replacing all possible matches.

This might work for you (GNU sed):
sed -E ':a;s/(^[^:]*:( \S+)*) +/\1 /;ta' file
Match on two or more spaces following a : and replace by a single space, then repeat until no more matches.
Alternative:
sed -E ':a;s/(^[^:]*:[^ ]*) +/\1\n/;ta;y/\n/ /' file
Replace all space or spaces after a : with a newline and repeat until no more matches. Then translate all newlines to spaces (newlines are never present in the pattern space unless the user places them there).

Related

sed or awk: delete/comment n lines following a pattern before 3 lines

To delete/comment 3 lines befor a pattern (including the line with the pattern):
how can i achive it through sed command
Ref:
sed or awk: delete n lines following a pattern
the above ref blog help to achive the this with after a pattern match but i need to know before match
define host{
use xxx;
host_name pattern;
alias yyy;
address zzz;
}
the below sed command will comment the '#' after the pattern match for example
sed -e '/pattern/,+3 s/^/#/' file.cfg
define host{
use xxx;
#host_name pattern;
#alias yyy;
#address zzz;
#}
like this how can i do this for the before pattern?
can any one help me to resolve this
If tac is allowed :
tac|sed -e '/pattern/,+3 s/^/#/'|tac
If tac isn't allowed :
sed -e '1!G;h;$!d'|sed -e '/pattern/,+3 s/^/#/'|sed -e '1!G;h;$!d'
(source : http://sed.sourceforge.net/sed1line.txt)
Reverse the file, comment the 3 lines after, then re-reverse the file.
tac file | sed '/pattern/ {s/^/#/; N; N; s/\n/&#/g;}' | tac
#define host{
#use xxx;
#host_name pattern;
alias yyy;
address zzz;
}
Although I think awk is a little easier to read:
tac file | awk '/pattern/ {c=3} c-- > 0 {$0 = "#" $0} 1' | tac
This might work for you (GNU sed):
sed ':a;N;s/\n/&/3;Ta;/pattern[^\n]*$/s/^/#/mg;P;D' file
Gather up 4 lines in the pattern space and if the last line contains pattern insert # at the beginning of each line in the pattern space.
To delete those 4 lines, use:
sed ':a;N;s/\n/&/3;Ta;/pattern[^\n]*$/d;P;D' file
To delete the 3 lines before pattern but not the line containing pattern use:
sed ':a;N;s/\n/&/3;Ta;/pattern[^\n]*$/s/.*\n//;P;D'

Conversion of date format using sed or awk

I am new to unix and i am searching for an answer for the below problem.
I have a semi colon delimited file as below
Frank;01012019;01012020;woodcrest wack st
Mark;01012019;01012020;Annunciation st
Fred;01022019;01012020;Baker st
The date format in the input file is in DDMMYYYY format. I need the date to be converted into YYYYMMDD format as below.
Expected Output:
Frank;20190101;20200101;woodcrest wack st
Mark;20190101;20200101;Annunciation st
Fred;20190201;20200101;Baker st
Please suggest me answers using sed or awk command.
With GNU sed:
sed -r 's/;([0-9]{2})([0-9]{2})([0-9]{4})/;\3\2\1/g' file.csv
Output:
Frank;20190101;20200101;woodcrest wack st
Mark;20190101;20200101;Annunciation st
Fred;20190201;20200101;Baker st
awk -F';' '{print $1";"substr($2, 5, 4)""substr($2, 1, 2)""substr($2, 0, 2)";"substr($3, 5, 4)""substr($3, 1, 2)""substr($3, 0, 2)";"$4}' file
sed -E 's/([0-9]{2})([0-9]{2})([0-9]{4});/\3\2\1;/g' data
#=> Frank;20190101;20200101;woodcrest wack st
#=> Mark;20190101;20200101;Annunciation st
#=> Fred;20190201;20200101;Baker st
\1, \2, and \3 represent each parenthesis catched content, i.e. the DD, MM, and YYYY here. s is to replace in sed.
The g at last means to replace all occurances, without it sed will only replace first group.
If the input is formatted and stable like you said, then sed is actually easier to do this.
ps: -E is for extended regular expressions, it works both on unix sed and GNU sed.
It relieves you the needs to escape (){}.
With Perl
$ cat sadhiya.txt
Frank;01012019;01012020;woodcrest wack st
Mark;01012019;01012020;Annunciation st
Fred;01022019;01012020;Baker st
$ perl -F";" -lane ' s/(.{2})(.{2})(.{4})/$3$2$1/g for #F[1..2]; print join(";",#F) ' sadhiya.txt
Frank;20190101;20200101;woodcrest wack st
Mark;20190101;20200101;Annunciation st
Fred;20190201;20200101;Baker st
With sed:
sed -E -n 's/(.*);([0-9]{2})([0-9]{2})([0-9]{4});([0-9]{2})([0-9]{2})([0-9]{4});(.*)/\1;\4\3\2;\7\6\5;\8/p' file_name

Replace pattern in specific column in sed

I have a tab file with two columns like below
BB_12 100_AA
BB_13 101_AB
BB_14 102_AD
BB_15 103_AC
I wish to remove the number_ in second column (replace number_ with nothing). For this I tried sed replace in the following ways unsuccessfully.
sed 's/\d+\_//g' infile
sed 's/(\d+\_)//g' infile
But none of the tweaks worked. It looks like it is not searching in 2nd column. How to modify this ? The expected output is
BB_12 AA
BB_13 AB
BB_14 AD
BB_15 AC
Thanks in advance.
You may just process the last column with sed:
sed -E 's/[^ ]*_([^ ]*) *$/\1/' file
The output:
BB_12 AA
BB_13 AB
BB_14 AD
BB_15 AC
Awk alternative:
awk '{ sub(/^[^ ]+_/, "", $2) }1' OFS='\t' file
Following simple sed may help you in same.
sed 's/\([^ ]*\) \([^_]*\)_\(.*\)/\1 \3/g' Input_file
Output will be as follows.
BB_12 AA
BB_13 AB
BB_14 AD
BB_15 AC

sed replace if part of word matches

My text looks like this:
cat
catch
cat_mouse
catty
I want to replace "cat" with "dog".
When I do
sed "s/cat/dog/"
my result is:
dog
catch
cat_mouse
catty
How do I replace with sed if only part of the word matches?
There's a mistake :
You lack the g modifier
sed 's/cat/dog/g'
g
Apply the replacement to all matches to the regexp, not just the first.
See
http://www.gnu.org/software/sed/manual/html_node/The-_0022s_0022-Command.html
http://sed.sourceforge.net/sedfaq3.html#s3.1.3
If you want to replace only cat by dog only if part of the word matches :
$ perl -pe 's/cat(?=.)/dog/' file.txt
cat
dogch
dog_mouse
dogty
I use Positive Look Around, see http://www.perlmonks.org/?node_id=518444
If you really want sed :
sed '/^cat$/!s/cat/dog/' file.txt
bash-3.00$ cat t
cat
catch
cat_mouse
catty
To replace cat only if it is part of a string
bash-3.00$ sed 's/cat\([^$]\)/dog\1/' t
cat
dogch
dog_mouse
dogty
To replace all occurrences of cat:
bash-3.00$ sed 's/cat/dog/' t
dog
dogch
dog_mouse
dogty
awk solution for this
awk '{gsub("cat","dog",$0); print}' temp.txt

Printing next line with sed

I want to print next line of matching word with sed.
I tried this command but it gives error :
sed -n '/<!\[CDATA\[\]\]>/ { N p}/' test.xml
what about grep -e -A 1 regex? It will print line below regex.
With sed, looking for pattern "dd", below works fine as you would:
sed -n '/dd/ {n;p}' file
For file content:
dd
aa
ss
aa
It prints:
aa
use awk
awk '/pattern/{getline;print}' file