Global Match Anchors \G: rolling lover over an input file - perl

See:
https://www.oreilly.com/library/view/mastering-perl/9780596527242/ch02.html
I'm having some trouble getting the perl Global Match Anchors \G to work with my input file proved below with my perl code... I would have thought \G keeps picking up where it left off in the previous match and matches from that position in the string?
note, if i uncomment these two line it works:
#!/bin/perl
$vlog = "out/tb_asc.sv";
open(my $F, "$vlog") || die("cannot open file: $vlog\n");
#lines = <$F>;
chomp(#lines);
$bigline = join("\n", #lines);
close($F);
$movingline = $bigline;
print ">> << START\n";
$moving = $bigline;
$moving =~ s|//.*$||mg;
$moving =~ s|\s+$||mg;
while(1) {
# Blank Linke
if ($moving =~ /\G\n/g) {
#$moving = substr $moving, $+[0]+1; # <= doesn't \G anchor imply this line?
next;
}
# Precompiler Line
if ($moving =~ /\G\s*`(\w+)(\s+(.*))?\n/g) {
$vpccmd = $1;
$vpcarg1 = $3;
#$moving = substr $moving, $+[0]+1;
print "vpc_cmd($vpccmd) vpc_arg1($vpcarg1)\n";
next;
}
$c = nextline($moving);
print "\n=> processing:[$c]\n";
die("parse error\n");
}
sub nextline($) {
#c = split(/\n/, $moving);
$c = $c[0];
chomp($c);
return $c;
}
sample input file: out/tb_asc.sv
`timescale 1ns / 1ps
`define DES tb_asc.DES.HS86.CORE
`ifdef HS97_MODE
`define SER_DUT HS97
`ifndef HS78_MODE
`define SER_DUT HS78
`else //1:HS78_MODE
`define SER_DUT HS89
`define SER tb_asc.SER.`SER_DUT.CORE
`ifdef MULTIPLE_SERS
`define SER_1 tb_asc.SER_1.`SER_DUT.CORE
`define SER_2 tb_asc.SER_2.`SER_DUT.CORE
`define SER_3 tb_asc.SER_3.`SER_DUT.CORE
`else //1:MULTIPLE_SERS
`define SER_1 tb_asc.SER.`SER_DUT.CORE
`define SER_2 tb_asc.SER.`SER_DUT.CORE
`define SER_3 tb_asc.SER.`SER_DUT.CORE
`define REPCAPCAL DIGITAL_TOP.RLMS_A.REPCAL.U_REPCAPCAL_CTRL
`define MPU_D POWER_MGR.Ism.por_pwdnb_release
`define DFE_OUT RXD.EC.Eslicer.QP
`define DFE_OUT_SER RXD.EC.Eslicer.QP
//beg-include-1 ser_reg_defs_macros.sv
//FILE: /design/proj/hs89/users/HS89D-0A/digital/modules/cc/src/ser_reg_defs_macros.sv
`define CFG_BLOCK "CFG_BLOCK"
`define DEV_ADDR "DEV_ADDR"
`define RX_RATE "RX_RATE"
`define TX_RATE "TX_RATE"
`define DIS_REM_CC "DIS_REM_CC"
`define DIS_LOCAL_CC "DIS_LOCAL_CC"
NOTE: this version works but doesn't use \G:
while(1) {
# Blank Linke
if ($moving =~ /\A$/m) {
$moving = substr $moving, $+[0]+1;
next;
}
# Precompiler Line
if ($moving =~ /\A\s*`/) {
$moving =~ /\A\s*`(\w+)(\s+(.*))?$/m;
$vpccmd = $1;
$vpcarg1 = $3;
$moving = substr $moving, $+[0]+1;
print "vpc_cmd($vpccmd) vpc_arg1($vpcarg1)\n";
next;
}
$c = nextline($moving);
print "\n=> processing:[$c]\n";
die("parse error\n");
}
I prefer to do this using \G because substr uses a lot of CPU time with a large input file.

The bit you're missing is that is that an unsuccessful match resets the position.
$ perl -Mv5.14 -e'$_ = "abc"; /./g; /x/g; say $& if /./g;'
a
Unless you also use /c, that is.
$ perl -Mv5.14 -e'$_ = "abc"; /./gc; /x/gc; say $& if /./gc;'
b

When your match fails, In the link to Mastering Perl that you provide, I wrote:
I have a way to get around Perl resetting the match position. If I want to try a match without resetting the starting point even if it fails, I can add the /c flag, which simply means to not reset the match position on a failed match. I can try something without suffering a penalty. If that doesn’t work, I can try something else at the same match position. This feature is a poor man’s lexer.
My example that I think you are trying to use has /gc on all the matches using \G.

Related

Perl filter with substitution

I am attempting to create a Perl script that filters data presented on STDIN, changing all occurrences of
one string to another and outputting all input lines, changed and unchanged to STDOUT. FROMSTRING and TOSTRING can be PERL-compatible regular expressions. I am unable to get matching output.
Here is an example of what I am trying to achieve.
echo "Today is Saturday" | f.pl 'a' '#'
Output Tod#y is S#turd#y.
echo io | filter.pl '([aeiou])([aeiou])' '$2$1'
Output oi.
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2){
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n"
}
exit 1;
my $FROM = $ARGV[0];
my $TO = $ARGV[1];
my $inLine = "";
while (<STDIN>){
$inLine = $_;
$inLine =~ s/$FROM/$TO/;
print $inLine
}
exit 0;
First off, the replacement part of a s/.../.../ operation is not a regex; it works like a double-quoted string.
There are a couple of issues with your code.
Your exit 1; statement appears in the middle of the main code, not in the error block. You probably want:
if (#ARGV != 2) {
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n";
exit 1;
}
You're missing a g flag if you want multiple substitutions to happen in the same line:
$inLine =~ s/$FROM/$TO/g;
There's no need to predeclare $inLine; it's only used in one block.
There's also no need to read a line into $_ just to copy it into $inLine.
It's common to use $names_like_this for variables and functions, not $namesLikeThis.
You can use $0 instead of hardcoding the program name in the error message.
exit 0; is redundant at the end.
The following is closer to how I'd write it:
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2) {
die "Usage: $0 FROMSTRING TOSTRING\n";
}
my ($from, $to) = #ARGV;
while (my $line = readline STDIN) {
$line =~ s/$from/$to/g;
print $line;
}
That said, none of this addresses your second example with '$2$1' as the replacement. The above code won't do what you want because $to is a plain string. Perl won't scan it to look for things like $1 and replace them.
When you write "foo $bar baz" in your code, it means the same thing as 'foo ' . $bar . ' baz', but this only applies to code, i.e. stuff that literally appears in your source code. The contents of $bar aren't re-scanned at runtime to expand e.g. \n or $quux. This also applies to $1 and friends, which are just normal variables.
So how do you get '$2$1' to work?
One way is to mess around with eval, but I don't like it because, well, it's eval: If you're not very careful, it would allow someone to execute arbitrary code by passing the right replacement "string".
Doing it without eval is possible and even easy with e.g. Data::Munge::replace:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Munge qw(replace);
if (#ARGV != 2) {
die "Usage: $0 FROMSTRING TOSTRING\n";
}
my ($from, $to) = #ARGV;
while (my $line = readline STDIN) {
print replace($line, $from, $to, 'g');
}
replace works like JavaScript's String#replace in that it expands special $ sequences.
Doing it by hand is also possible but slightly annoying because you basically have to treat $to as a template and expand all $ sequences by hand (e.g. by using another regex substitution):
# untested
$line =~ s{$from}{
my #start = #-;
my #stop = #+;
(my $r = $to) =~ s{\$([0-9]+|\$)}{
$1 eq '$'
? '$'
: substr($from, $start[$1], $stop[$1] - $start[$1])
}eg;
$r
}eg;
(This does not implement braced groups such as ${1}, ${2}, etc. Those are left as an exercise for the reader.)
This code is sufficiently annoying to write (and look at) that I much prefer using a module like Data::Munge for this sort of thing.
three errors found:
; after error message
exit 1;
$inLine =~ s/$FROM/$TO/g;
like:
#!/usr/bin/perl
use strict;
use warnings;
if (#ARGV != 2){
print STDERR "Usage: ./filter.pl FROMSTRING TOSTRING\n";
exit 1;
}
my $FROM = $ARGV[0];
my $TO = $ARGV[1];
my $inLine = "";
while (<STDIN>){
$inLine = $_;
$inLine =~ s/$FROM/$TO/g;
print $inLine
}
exit 0;

How to append and the line once the matched pattern is found

I want to search the string in a file and if the search string is found
then I want to replace the three line based on value in curly braces.
I was going through one of solution from stack overflow
Perl - Insert lines after a match is found in a file
is-found-in-a-file
But the things are not working for me
input_file:
abcdef1{3} { 0x55, 0x55, 0x55 }
abcdef2{2} { 0x55, 0x55}
code:
use strict;
use warnings;
my $ipfile = 'input.txt';
open my $my_fh "<", $ipfile or die "Couldn't open input file: $!";
while(<$my_fh>)
{
if (/$abcdef1/)
{
s/abcdef1{3} {\n/abcdef1{3} {\nabcdef1 0x55\nabcdef1 0x55\nabcdef1
0x55\n/gm;
}
}
expected output:
abcdef1 0x55
abcdef1 0x55
abcdef1 0x55
abcdef2 0x55
abcdef2 0x55
Any help with explanation would be grateful.
Note in perlre and RE.info that using $ and { ... } have special meanings within regular expressions. You may not see output because you are missing at least one print statement. The first curly enclosure (ie: {\d+}) could be optional unless you want to validate the length of the series in the second enclosure.
Your loop may look something like:
while (<$my_fh>) {
if (/
^ # beginning of line
([^{]+) # the base pattern captured in $1 ("non-left curly braces")
.* # any number of characters
\{\s*(.*?)\s*\} # the data section surrounded by curlies captured in $2
$ # end of line
/x) # allow whitespace and comments
{
for my $val (split /, /, $2) {
print "$1 $val\n";
}
} else {
print;
}
}
Or more tersely:
while (my $line = <$my_fh>) {
if ($line =~ /^([^{]+).*\{\s*(.*?)\s*\}$/) {
$line = '';
$line .= "$1 $_\n" for split /, /, $2;
}
print $line;
}
The ? in the pattern .*? indicates a non-greedy match. In this case, it avoids matching the whitespace next to the second right curly brace.

How can I interpolate literal escape sequences with the substitution operator?

This code:
my $st = "37a64";
my $grep = '\n';
$st =~ s/a/$grep/;
print $st;
Prints:
37\n64
I would like to see the following output:
37
64
But I can only modify the \n and regex options because I'm importing $st from another file.
I don't know of an existing module to do that.
my %tr = (
n => "\n",
r => "\r",
t => "\t",
# ...
);
$grep =~ s{\\(?:(\W)|(.))}{
defined($1) ? $1 :
defined($tr{$2}) ? $tr{$2} :
do { warn("Unrecognized escapes \\$2"); "\\$2" }
}seg;
Please avoid any recommendation to pass inputs to eval EXPR (sometimes dangerously disguised as s///ee). They are surely buggy and dangerous.
You can use the /ee modifier to evaluate the replacement:
$st =~ s/a/qq("$grep")/ee;
To understand its function, try
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #st = ("37a64") x 3;
my $grep = '\n';
$st[0] =~ s/a/qq("$grep")/;
say $st[0]; # 37qq("\n")64
$st[1] =~ s/a/qq("$grep")/e;
say $st[1]; # 37"\n"64
$st[2] =~ s/a/qq("$grep")/ee;
say $st[2]; # 37
# 64

Line by line editing to all the files in a folder ( and subfolders ) in perl

I want my perl program to do the substitution of '{' - > '{function('.counter++.')' in all the files except the lines when there is a '{' and a '}' in the same line, and except when the '{' appears one line under a 'typedef' substring.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
use File::Find;
my $dir = "C:/test dir";
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
while (<>)
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $ARGV or die "Can't read file: $!\n"
foreach (#lines)
{
if (!( index (#lines,'}')!= -1 )) # if there is a '}' in the same line don't add the macro
{
s/{/'{function(' . $counter++ . ')'/ge;
print;
}
}
untie #lines; # free #lines
}
what I was trying to do is to go through all the files in #ARGV that i found in my dir and subdirs and for each *.c or *.h file I want to go line by line and check if this line contains '{'. if it does the program won't check if there is a '{' and won't make the substitution, if it doesn't the program will substitute '{' with '{function('.counter++.');'
unfortunately this code does not work. I'm ashamed to say that I'm trying to make it work all day and still no go.I think that my problem is that I'm not really working with lines where I search for '{' but I don't understand why. I would really appreciate some help.
I would also like to add that I am working in windows environment.
Thank You!!
Edit: so far with your help this is the code:
use strict;
use warnings;
use File::Find;
my $dir = "C:/projects/SW/fw/phy"; # use forward slashes in paths
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
while (<>) {
s/{/'{ function(' . $counter++ . ')'/ge unless /}/;
print;
}
The only thing that is left to do is to make it ignore '{' substitution when it is one line under 'typedef' substring like this:
typedef struct
{
}examp;
I would greatly appreciate your help! Thank you! :)
Edit #2: This is the final code:
use strict;
use warnings;
use File::Find;
my $dir = "C:/exmp";
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
my $td = 0;
# now process our files
while (<>) {
s/{/'{ function(' . $counter++ . ')'/ge if /{[^}]*$/ && $td == 0;
$td = do { (/typedef/ ? 1 : 0 ) || ( (/=/ ? 1 : 0 ) && (/if/ ? 0 : 1 ) && (/while/ ? 0 : 1 ) && (/for/ ? 0 : 1 ) && (/switch/ ? 0 : 1 ) )};
print;
}
The code does the substitution except when the line above the substitution place included 'typedef',
When the line above it included '=' and no 'if', 'while', 'for' or 'switch' the substitiution will also not happen.
Thank you all for your help!
The -i swith let you presise an extension for backup files.
Using perl:
perl -pe "/{[^}]*\$/&&do{s/{/{function('.counter++.');/}" -i.bak *
or (same result):
perl -pe "s/{/{function('.counter++.');/ if /{[^}]*\$/" -i.bak *
And for processing all files in sub-folder too, this could be simplier to use find:
find . -type f -print0 |
xargs -0 perl -pe "s/{/{function('.counter++.');/ if /{[^}]*\$/" -i.bak
Using GNU sed let you do the job very quickly
sed -e "/{[^}]*\$/{s/{/{function('.counter++.');/}" -i.bak *
Edit For doing modification only if previous line don't contain word typedef:
perl -pe "BEGIN { my \$td=1; };s/{/{function('.counter++.');/ if /{[^}]*\$/ && \$td==1 ; \$td=do{/typedef/?0:1};" -i.bak *
could be written;
perl -pe "
BEGIN { my \$td=0; };
s/{/{function('.counter++.');/ if /{[^}]*\$/ && \$td==0 ;
\$td=do{/typedef/?1:0};" -i.bak *
or more readable as
perl -pe '
BEGIN { my $td=0; };
s/{/{function(\047.counter++.\047);/ if /{[^}]*$/ && $td==0;
$td=do{/typedef/?1:0};
' -i.bak *
Or as a perl script file: cat >addFunction.pl
#!/usr/bin/perl -pi.bak
BEGIN { my $td = 0; }
s/{/{function(\047.counter++.\047);/ if /{[^}]*$/ && $td == 0;
$td = do { /typedef/ ? 1 : 0 };
Explained:
BEGIN{...} command block to do at begin of program.
s/// if // && to replacement if current match match and $td=0
$td=do{ aaa ? bbb : ccc } assing to td: if aaa then bbb else ccc.
As perl run sequetialy, $td keep his value until next assignement. So if test for replacement is doing before $td assignement, the check will use previous value.
And finaly, same using sed:
sed -e '/{[^}]*$/{x;/./{x;s/{/{function(\o047.counter++.\o047);/;x;};x;};h;s/^.*typedef.*$//;x;' -i.bak *
or more readable:
sed -e '
/{[^}]*$/{
x;
/./{
x;
s/{/{function(\o047.counter++.\o047);/;
x;
};
x;
};
h;
s/^/./;
s/^.*typedef.*$//;
x;
' -i.bak *
Some sed tricks:
h store (backup) current line to the hold space
x exchange current working line with the hold space
s/// well known replacement string command
\o047 octal tick: '
/{[^}]*$/{ ... } Command block to do only on lines maching { and no }.
/./{ ... } Command block to do only on lines containing at least 1 character
Here is one way to skip the substitution if '}' exists:
if ( $_ !~ /}/ ) { # same as !( $_ =~ /}/ )
s/{/'{function(' . $counter++ . ')'/ge;
}
Make sure that the print is outside the conditional though, or the line won't be printed if the '}' is missing.
Other ways to write it:
unless ( /}/ ) {
s/{/'{function(' . $counter++ . ')'/ge;
}
Or simply:
s/{/'{function(' . $counter++ . ')'/ge unless /}/;
I think this issue is you're looking for the index in #lines. You want this:
while (<>)
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $ARGV or die "Can't read file: $!\n"
foreach my $line(#lines)
{
if ( index ($lines,'}')== -1 ) # if there is a '}' in the same line don't add the macro
{
$line =~ s/{/'{function(' . $counter++ . ')'/ge;
}
print $line;
}
untie #lines; # free #lines
}
I'm also a bit unclear about how $ARGV is being set. You may want to use $_ in that place based on how you have your script.

Saving a transliteration table in perl

I want to transliterate digits from 1 - 8 with 0 but not knowing the number at compile time. Since transliterations do not interpolate variables I'm doing this:
#trs = (sub{die},sub{${$_[0]} =~ tr/[0,1]/[1,0]/},sub{${$_[0]} =~ tr/[0,2]/[2,0]/},sub{${$_[0]} =~ tr/[0,3]/[3,0]/},sub{${$_[0]} =~ tr/[0,4]/[4,0]/},sub{${$_[0]} =~ tr/[0,5]/[5,0]/},sub{${$_[0]} =~ tr/[0,6]/[6,0]/},sub{${$_[0]} =~ tr/[0,7]/[7,0]/},sub{${$_[0]} =~ tr/[0,8]/[8,0]/});
and then index it like:
$trs[$character_to_transliterate](\$var_to_change);
I would appreciate if anyone can point me to a best looking solution.
Any time that you are repeating yourself, you should see if what you are doing can be done in a loop. Since tr creates its tables at compile time, you can use eval to access the compiler at runtime:
my #trs = (sub {die}, map {eval "sub {\$_[0] =~ tr/${_}0/0$_/}"} 1 .. 8);
my $x = 123;
$trs[2]($x);
print "$x\n"; # 103
There is also no need to use references here, subroutine arguments are already passed by reference.
If you do not want to use string eval, you need to use a construct that supports runtime modification. For that you can use the s/// operator:
sub subst {$_[0] =~ s/($_[1]|0)/$1 ? 0 : $_[1]/ge}
my $z = 1230;
subst $z => 2;
print "$z\n"; # 1032
The tr/// construct is faster than s/// since the latter supports regular expressions.
I'd suggest simply ditching tr in favor of something that actually permits a little bit of metaprogramming like s///. For example:
# Replace $to_swap with 0 and 0 with $to_swap, and leave
# everything else alone.
sub swap_with_0 {
my ($digit, $to_swap) = #_;
if ($digit == $to_swap) {
return 0;
} elsif ($digit == 0) {
return $to_swap;
} else {
return $digit;
}
}
# Swap 0 and $to_swap throughout $string
sub swap_digits {
my ($string, $to_swap) = #_;
$string =~ s/([0$to_swap])/swap_with_0($1, $to_swap)/eg;
return $string;
}
which is surprisingly straightforward. :)
Here's a short subroutine that uses substitution instead of transliteration:
sub swap_digits {
my ($str, $digit) = #_;
$str =~ s{ (0) | $digit }{ defined $1 ? $digit : 0 }gex;
return $str;
}