Using gawk or sed to inplace change line after or before every other match - perl

I am trying to change the comment format for many files. Currently I have
//-----------------------------------------------------------
// NAME : Class
// DESCRIPTION : Vague information
//-----------------------------------------------------------
The number of - may be different with each file, or even with each case. I would like something like this to keep readability with the rest of the code while still allowing compatibility with a documenting program:
//-----------------------------------------------------------
/*!
// NAME : Class
// DESCRIPTION : Vague information
*/
//-----------------------------------------------------------
But this is also acceptable, and for sure easier:
/*!
// NAME : Class
// DESCRIPTION : Vague information
*/
For a start, I have tried something like
$ gawk -i inplace '/^\/\/--.*/&&v++%2 {sub(/^\/\/--.*, "\/\*!")}' filename.cpp
and then
$ gawk -i inplace '/^\/\/--.*/{sub(/^\/\/--.*, "\*\/")}' filename.cpp
which obviously has some syntax errors; the goal was to change every odd instance of //---* with /*!, and then replacing //---* with */, and for just one file, instead of many .cpp and .h/.hpp files.

Given:
$ cat file
//-----------------------------------------------------------
// NAME : Class
// DESCRIPTION : Vague information
//-----------------------------------------------------------
With THIS regex, you can use perl:
$ perl -0777 -pE 's/^(\/\/-+\R)(\/\/[\s\S]*?)(^\/\/-+\R?)/\1\/\*!\n\2\*\/\n\3/gm' file
//-----------------------------------------------------------
/*!
// NAME : Class
// DESCRIPTION : Vague information
*/
//-----------------------------------------------------------
Perl also has an inplace option.

Using sed
$ sed -E 'N;s~-\n~&/*!\n~;s~\n/+-~\n*/&~' input_file
//-----------------------------------------------------------
/*!
// NAME : Class
// DESCRIPTION : Vague information
*/
//-----------------------------------------------------------

perl -0777 -wnE'
say s{ //-+\n\K (//.*?) (//-+\n) }{/*!\n$1*/\n$2}sgrx
' file_with_comment_blocks.txt
Broken into lines only for readability. Modifiers (at the end of regex) are:
s with it the . matches a newline as well
g repeat throughout the whole string ("global")
r so to return the new string (or the unchanged original). Used here for printing
x ignore spaces, so use them for readability (also allows newlines and #-comments)
This only prints so it can be redirected to a file for a result while it is safe for testing.
Or, change so to rewrite the file in-place
perl -0777 -i.bak -wnE'
s{ //-+\n\K (//.*?) (//-+\n) }{/*!\n$1*/\n$2}sgx
' file_with_comment_blocks.txt
If backup isn't wanted then use -i (no .bak). See Command-line switches in perlrun

I would harness GNU AWK for this task following way, let file.txt content be
//-----------------------------------------------------------
// NAME : Class
// DESCRIPTION : Vague information
//-----------------------------------------------------------
then
awk 'index($0,"//---")==1{++v;if(v%2){print;print "/*!"}else{print "*/";print};next}{print}' file.txt
gives output
//-----------------------------------------------------------
/*!
// NAME : Class
// DESCRIPTION : Vague information
*/
//-----------------------------------------------------------
Explanation: for lines starting with //--- (I use index String function to avoid need to escape /) I do increase v by 1 and then print current line followed by /*! in case v is odd else print */ followed by current line, then I instruct GNU AWK to go to next line to avoid duplicate printing, all others lines are just printed. After you are sure result is compliant with requirement you might add -i inplace after awk.
(tested in GNU Awk 5.0.1)

Using any awk:
$ awk '/^\/\/-+$/{ $0=( ++c%2 ? "/*!" : "*/" ) } 1' file
/*!
// NAME : Class
// DESCRIPTION : Vague information
*/
or if you prefer:
$ awk '/^\/\/-+$/{ $0=( ++c%2 ? $0 ORS "/*!" : "*/" ORS $0 ) } 1' file
//-----------------------------------------------------------
/*!
// NAME : Class
// DESCRIPTION : Vague information
*/
//-----------------------------------------------------------
To change the input file, if you have GNU awk change awk to awk -i inplace or with any awk add >tmp && mv tmp file at the end.

This might work for you (GNU sed):
sed '/^\/\/-\+$/{:a;N;/\n\/\/-\+$/!ba;s/\n.*\n/\n\/*!&*\/\n/}' file
Gather up lines that start and end //----------....
Prepend \n/*! and append */\n between the first and last newline of the collection.
Or if prefered:
sed '/^\/\/-\+$/{:a;N;/\n\/\/-\+$/!ba;s/^[^\n]*/\/*!/;s/\(.*\n\).*/\1*\//}' file
A "belt a braces" version of the first solution, that only applies to lines that begin and end as OP example with all lines commented inbetween:
sed '/^\/\/-\+$/{:a;N;/\n\/\/[^\n]*$/!{:b;P;D}
/\n\/\/-\+$/!ba;s/\n.*\n/\n\/*!&*\/\n/;Tb}' file

Related

Replacing all occurrence after nth occurrence in a line in perl

I need to replace all occurrences of a string after nth occurrence in every line of a Unix file.
My file data:
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
My output data:
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
tried using sed: sed 's/://3g' test.txt
Unfortunately, the g option with the occurrence is not working as expected. instead, it is replacing all the occurrences.
Another approach using awk
awk -v c=':' -v n=2 'BEGIN{
FS=OFS=""
}
{
j=0;
for(i=0; ++i<=NF;)
if($i==c && j++>=n)$i=""
}1' file
$ cat file
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
$ awk -v c=':' -v n=2 'BEGIN{FS=OFS=""}{j=0;for(i=0; ++i<=NF;)if($i==c && j++>=n)$i=""}1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
With GNU awk, using gensub please try following. This is completely based on your shown samples, where OP wants to remove : from 3rd occurrence onwards. Using gensub to segregate parts of matched values and removing all colons from 2nd part(from 3rd colon onwards) in it as per OP's requirement.
awk -v regex="^([^:]*:)([^:]*:)(.*)" '
{
firstPart=restPart=""
firstPart=gensub(regex, "\\1 \\2", "1", $0)
restPart=gensub(regex,"\\3","1",$0)
gsub(/:/,"",restPart)
print firstPart restPart
}
' Input_file
I have inferred based on the limited data you've given us, so it's possible this won't work. But I wouldn't use regex for this job. What you have there is colon delimited fields.
So I'd approach it using split to extract the data, and then some form of string formatting to reassemble exactly what you like:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ( undef, $first, #rest ) = split /:/;
print ":$first:", join ( "", #rest ),"\n";
}
__DATA__
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
This gives you the desired result, whilst IMO being considerably clearer for the next reader than a complicated regex.
You can use the perl solution like
perl -pe 's~^(?:[^:]*:){2}(*SKIP)(?!)|:~~g if /^:account_id:/' test.txt
See the online demo and the regex demo.
The ^(?:[^:]*:){2}(*SKIP)(?!)|: regex means:
^(?:[^:]*:){2}(*SKIP)(?!) - match
^ - start of string (here, a line)
(?:[^:]*:){2} - two occurrences of any zero or more chars other than a : and then a : char
(*SKIP)(?!) - skip the match and go on to search for the next match from the failure position
| - or
: - match a : char.
And only run the replacement if the current line starts with :account_id: (see if /^:account_id:/').
Or an awk solution like
awk 'BEGIN{OFS=FS=":"} /^:account_id:/ {result="";for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result}' test.txt
See this online demo. Details:
BEGIN{OFS=FS=":"} - sets the input/output field separator to :
/^:account_id:/ - line must start with :account_id:
result="" - sets result variable to an empty string
for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result} - iterates over the fields and if the field number is greater than 2, just append the current field value to result, else, append the value + output field separator; then print the result.
I would use GNU AWK following way if n fixed and equal 2 following way, let file.txt content be
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
then
awk 'BEGIN{FS=":";OFS=""}{$2=FS $2 FS;print}' file.txt
output
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
Explanation: use : as field separator and nothing as output field separator, this itself does remove all : so I add : which have to be preserved: 1st (before second column) and 2nd (after second column). Beware that I tested it solely for this data, so if you would want to use it you should firstly test it with more possible inputs.
(tested in gawk 4.2.1)
This might work for you (GNU sed):
sed 's/:/\n/3;h;s/://g;H;g;s/\n.*\n//' file
Replace the third occurrence of : by a newline.
Make a copy of the line.
Delete all occurrences of :'s.
Append the amended line to the copy.
Join the two lines by removing everything from third occurrence of the copy to the third occurrence of the amended line.
N.B. The use of the newline is the best delimiter to use in the case of sed, as the line presented to seds commands are initially devoid of newlines. However the important property of the delimiter is that it is unique and therefore can be any such character as long as it is not found anywhere in the data set.
An alternative solution uses a loop to remove all :'s after the first two:
sed -E ':a;s/^(([^:]*:){2}[^:]*):/\1/;ta' file
With GNU awk for the 3rd arg to match() and gensub():
$ awk 'match($0,/(:[^:]+:)(.*)/,a){ $0=a[1] gensub(/:/,"","g",a[2]) } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
and with any awk in any shell on every Unix box:
$ awk 'match($0,/:[^:]+:/){ tgt=substr($0,1+RLENGTH); gsub(/:/,"",tgt); $0=substr($0,1,RLENGTH) tgt } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

How to replace a block of code between two patterns with blank lines?

I am trying replace a block of code between two patterns with blank lines
Tried using below command
sed '/PATTERN-1/,/PATTERN-2/d' input.pl
But it only removes the lines between the patterns
PATTERN-1 : "=head"
PATTERN-2 : "=cut"
input.pl contains below text
=head
hello
hello world
world
morning
gud
=cut
Required output :
=head
=cut
Can anyone help me on this?
$ awk '/=cut/{f=0} {print (f ? "" : $0)} /=head/{f=1}' file
=head
=cut
To modify the given sed command, try
$ sed '/=head/,/=cut/{//! s/.*//}' ip.txt
=head
=cut
//! to match other than start/end ranges, might depend on sed implementation whether it dynamically matches both the ranges or statically only one of them. Works on GNU sed
s/.*// to clear these lines
awk '/=cut/{found=0}found{print "";next}/=head/{found=1}1' infile
# OR
# ^ to take care of line starts with regexp
awk '/^=cut/{found=0}found{print "";next}/^=head/{found=1}1' infile
Explanation:
awk '/=cut/{ # if line contains regexp
found=0 # set variable found = 0
}
found{ # if variable found is nonzero value
print ""; # print ""
next # go to next line
}
/=head/{ # if line contains regexp
found=1 # set variable found = 1
}1 # 1 at the end does default operation
# print current line/row/record
' infile
Test Results:
$ cat infile
=head
hello
hello world
world
morning
gud
=cut
$ awk '/=cut/{found=0}found{print "";next}/=head/{found=1}1' infile
=head
=cut
This might work for you (GNU sed):
sed '/=head/,/=cut/{//!z}' file
Zap the lines between =head and =cut.

Use sed to replace word in 2-line pattern

I try to use sed to replace a word in a 2-line pattern with another word. When in one line the pattern 'MACRO "something"' is found then in the next line replace 'BLOCK' with 'CORE'. The "something" is to be put into a reference and printed out as well.
My input data:
MACRO ABCD
CLASS BLOCK ;
SYMMETRY X Y ;
Desired outcome:
MACRO ABCD
CLASS CORE ;
SYMMETRY X Y ;
My attempt in sed so far:
sed 's/MACRO \([A-Za-z0-9]*\)/,/ CLASS BLOCK ;/MACRO \1\n CLASS CORE ;/g' input.txt
The above did not work giving message:
sed: -e expression #1, char 30: unknown option to `s'
What am I missing?
I'm open to one-liner solutions in perl as well.
Thanks,
Gert
Using a perl one-liner in slurp mode:
perl -0777 -pe 's/MACRO \w+\n CLASS \KBLOCK ;/CORE ;/g' input.txt
Or using a streaming example:
perl -pe '
s/^\s*\bCLASS \KBLOCK ;/CORE ;/ if $prev;
$prev = $_ =~ /^MACRO \w+$/
' input.txt
Explanation:
Switches:
-0777: Slurp files whole
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
When in one line the pattern 'MACRO "something"' is found then in the
next line replace 'BLOCK' with 'CORE'.
sed works on lines of input. If you want to perform substitution on the next line of a specified pattern, then you need to add that to the pattern space before being able to do so.
The following might work for you:
sed '/MACRO/{N;s/\(CLASS \)BLOCK/\1CORE/;}' filename
Quoting from the documentation:
`N'
Add a newline to the pattern space, then append the next line of
input to the pattern space. If there is no more input then sed
exits without processing any more commands.
If you want to make use of address range as in your attempt, then you need:
sed '/MACRO/,/CLASS BLOCK/{s/\(CLASS\) BLOCK/\1 CORE/}' filename
I'm not sure why do you need a backreference for substituting the macro name.
You could try this awk command also,
awk '{print}/MACRO/ {getline; sub (/BLOCK/,"CORE");{print}}' file
It prints all the lines as it is and do the replacing action on seeing a word MACRO on a line.
Since getline has so many pitfall I try not to use it, so:
awk '/MACRO/ {a++} a==1 {sub(/BLOCK/,"CORE")}1' file
MACRO ABCD
CLASS CORE ;
SYMMETRY X Y ;
This could do it
#!awk -f
BEGIN {
RS = ";"
}
/MACRO/ {
sub("BLOCK", "CORE")
}
{
printf s++ ? ";" $0 : $0
}
"line" ends with ;
sub BLOCK for CORE in "lines" with MACRO
print ; followed by "line" unless first line

Search for a particular multiline pattern using awk and sed

I want to read from the file /etc/lvm/lvm.conf and check for the below pattern that could span across multiple lines.
tags {
hosttags = 1
}
There could be as many white spaces between tags and {, { and hosttags and so forth. Also { could follow tags on the next line instead of being on the same line with it.
I'm planning to use awk and sed to do this.
While reading the file lvm.conf, it should skip empty lines and comments.
That I'm doing using.
data=$(awk < cat `cat /etc/lvm/lvm.conf`
/^#/ { next }
/^[[:space:]]*#/ { next }
/^[[:space:]]*$/ { next }
.
.
How can I use sed to find the pattern I described above?
Are you looking for something like this
sed -n '/{/,/}/p' input
i.e. print lines between tokens (inclusive)?
To delete lines containing # and empty lines or lines containing only whitespace, use
sed -n '/{/,/}/p' input | sed '/#/d' | sed '/^[ ]*$/d'
space and a tab--^
update
If empty lines are just empty lines (no ws), the above can be shortened to
sed -e '/#/d' -e '/^$/d' input
update2
To check if the pattern tags {... is present in file, use
$ tr -d '\n' < input | grep -o 'tags\s*{[^}]*}'
tags { hosttags = 1# this is a comment}
The tr part above removes all newlines, i.e. makes everything into one single line (will work great if the file isn't to large) and then search for the tags pattern and outputs all matches.
The return code from grep will be 0 is pattern was found, 1 if not.
Return code is stored in variable $?. Or pipe the above to wc -l to get the number of matches found.
update3
regex for searcing for tags { hosttags=1 } with any number of ws anywhere
'tags\s*{\s*hosttags\s*=\s*1*[^}]*}'
try this line:
awk '/^\s*#|^\s*$/{next}1' /etc/lvm/lvm.conf
One could try preprocessing the file first, removing commments and empty lines and introducing empty lines behind the closing curly brace for easy processing with the second awk.
awk 'NF && $1!~/^#/{print; if(/}/) print x}' file | awk '/pattern/' RS=

sed - comment a matching line and x lines after it

I need help with using sed to comment a matching lines and 4 lines which follows it.
in a text file.
my text file is like this:
[myprocess-a]
property1=1
property2=2
property3=3
property4=4
[anotherprocess-b]
property1=gffgg
property3=gjdl
property2=red
property4=djfjf
[myprocess-b]
property1=1
property4=4
property2=2
property3=3
I want to prefix # to all the lines having text '[myprocess' and 4 lines that follows it
expected output:
#[myprocess-a]
#property1=1
#property2=2
#property3=3
#property4=4
[anotherprocess-b]
property1=gffgg
property3=gjdl
property2=red
property4=djfjf
#[myprocess-b]
#property1=1
#property4=4
#property2=2
#property3=3
Greatly appreciate your help on this.
You can do this by applying a regular expression to a set of lines:
sed -e '/myprocess/,+4 s/^/#/'
This matches lines with 'myprocess' and the 4 lines after them. For those 4 lines it then inserts a '#' at the beginning of the line.
(I think this might be a GNU extension - it's not in any of the "sed one liner" cheatsheets I know)
sed '/\[myprocess/ { N;N;N;N; s/^/#/gm }' input_file
Using string concatenation and default action in awk.
http://www.gnu.org/software/gawk/manual/html_node/Concatenation.html
awk '/myprocess/{f=1} f>5{f=0} f{f++; $0="#" $0} 1' foo.txt
or if the block always ends with empty line
awk '/myprocess/{f=1} !NF{f=0} f{$0="#" $0} 1' foo.txt