Don't stop perl on: Quantifier in {,} bigger than 32766 in regex - perl

The question Retain newlines for POD in case of PPR::uncomment was solved in the answer by defining a new method decomment2 that leaves the newlines properly. All works quite well, but now I have a file to process with a very big HERE document (far to big to publish here so unfortunately a link to an external site): https://github.com/openssl/openssl/blob/master/crypto/bn/asm/ppc.pl
When this file is processed through the PPR::decomment2 I get the message:
Quantifier in {,} bigger than 32766 in regex; marked by <-- HERE in m/(?s:.{ <-- HERE 39303})/ at /home/User/perl5/lib/perl5/Doxygen/Filter/Perl.pm line 1222.
Line 1222 is: $str =~ m{ \A (?&PerlDocument) \Z in the PPR::decomment2 method.
At that moment, the Perl process terminates.
Is there a way to increase this "32766" limit, i.e 2**15-2, in some way?
Is there a way not to terminate the Perl process but set the PPR::ERROR or another flag, so this can be processed in the code?

As a point of information, the max value was doubled in Perl 5.30 to 65534.

I did some research and passed by the eval possibility.
Is there anything against a construct like:
eval {
my $mystr = PPR::decomment2($str)
# do some local dependent stuff
};
if ($#) {
# do the local handling if the error
}
Is there anything against a construct like this?
Edit (April 23, 2013): a patch has been made by the maintainer of the PPR package (Damian Conway). The package number is 25.

Related

Interpreting & modifying Perl one-liner?

I have the following Perl 'one-liner' script (found it online, so not mine):
perl -lsne '
/$today.* \[([0-9.]+)\]:.+dovecot_(?:login|plain):([^\s]+).* for (.*)/
and $sender{$2}{r}+=scalar (split / /,$3)
and $sender{$2}{i}{$1}=1;
END {
foreach $sender(keys %sender){
printf"Recip=%05d Hosts=%03d Auth=%s\n",
$sender{$sender}{r},
scalar (keys %{$sender{$sender}{i}}),
$sender;
}
}
' -- -today=$(date +%F) /var/log/exim_mainlog | sort
I've been trying to understand its innards, because I would like to modify it to re-use its functionality.
Some questions I got:
What does the flag -lsne does? (From what I know, it's got to be, at least, 3 different flags in one)
Where does $sender gets its value from?
What about that (?:login|plain) segment, are they 'variables'? (I get that's ReGex, I'm just not familiarized with it)
What I'm trying to achieve:
Get the number of emails sent by each user in a SMTP relay periodically (cron job)
If there's an irregular number of emails (say, 500 in a 1-hour timespan), do something (like shutting of the service, or send a notification)
Why I'm trying to achieve this:
Lately, someone has been using my SMTP server to send spam, so I would like to monitor email activity so they stop abusing the SMTP relay resources. (Security-related suggestions are always welcomed, but out of topic for this question. Trying to focus on the script for now)
What I'm NOT trying to achieve:
To get the script done by third-parties. (Just try and point me in the right direction, maybe an example)
So, any suggestions, guidance,and friendly comments are welcomed. I understand this may be an out-of-topic question, yet I've been struggling with this for almost a week and my background with Perl is null.
Thanks in advance.
What does the flag -lsne does? (From what I know, it's got to be, at least, 3 different flags in one)
-l causes lines of input read in to be auto-chomped, and lines of
output printed out to have "\n" auto-appended
-s enables switch
parsing. This is what creates the variable $today, because a
command-line switch of --today=$(date +%F) was passed.
-n surrounds the entire "one-liner" in a while (<>) { ... } loop.
Effectively reading every line from standard input and running the
body of the one liner on that line
-e is the switch that tells
perl to execute the following code from the command line, rather
than running a file containing Perl code
Where does $sender gets its value from?
I suspect you are confusing $sender with %sender. The code uses $sender{$2}{r} without explicitly mentioning %sender. This is a function of Perl called "auto-vivification". Basically, because we used $sender{$2}{r}, perl automatically created a variable %sender, and added a key whose name is whatever is in $2, and set the value of that key in %sender to be a reference to a new hash. It then set that new hash to have a key 'r' and a value of scalar (split / /,$3)
What about that (?:login|plain) segment, are they 'variables'? (I get that's ReGex, I'm just not familiarized with it)
It's saying that this portion of the regular expression will match either 'login' or 'plain'. The ?: at the beginning tells Perl that these parentheses are used only for clustering, not capturing. In other words, the result of this portion of the pattern match will not be stored in the $1, $2, $3, etc variables.
-MO=Deparse is your friend for understanding one-liners (and one liners that wrap into five lines on your terminal):
$ perl -MO=Deparse -lsne '/$today.* \[([0-9.]+)\]:.+dovecot_( ...
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE:
while ( defined($_ = <ARGV>) ) {
chomp $_;
$sender{$2}{'i'}{$1} = 1 if
/$today.* \[([0-9.]+)\]:.+dovecot_(?:login|plain):([^\s]+).* for (.*)/
and $sender{$2}{'r'} += scalar split(/ /, $3, 0);
sub END {
foreach $sender (keys %sender) {
printf "Recip=%05d Hosts=%03d Auth=%s\n",
$sender{$sender}{'r'},
scalar keys %{$sender{$sender}{'i'};}, $sender;
}
}
}
-e syntax OK
[newlines and indentation added for clarity]
What does the flag -lsne does? (From what I know, it's got to be, at least, 3 different flags in one)
You can access a summary of the available perl command line options by running '~$ perl -h' in the terminal. Below are filtered out the specific command line options you were asking about.
~$ perl -h|perl -ne 'print if /^\s+(-l|-s|-n|-e)/'
-e program one line of program (several -e's allowed, omit programfile)
-l[octal] enable line ending processing, specifies line terminator
-n assume "while (<>) { ... }" loop around program
-s enable rudimentary parsing for switches after programfile
Two examples of the '-s' command line option in use.
~$ perl -se 'print "Todays date is $today\n"' -- -today=`date +%F`
Todays date is 2016-10-17
~$ perl -se 'print "The sky is $color.\n"' -- -color='blue'
The sky is blue.
For detailed explanations of those command line options read the online documentation below.
http://perldoc.perl.org/perlrun.html
Or run the command below from your terminal.
~$ perldoc perlrun
Unrelated to the questions of the OP, I'm aware that this is not a complete answer (added as much as I was able to at the moment), so if this post/answer violates any SO rules, the moderators should remove it. Thx.

Perl script getting stuck in terminal for no apparent reason

I have a Perl script which reads three files and writes new files after reading each one of them. Everything is one thread.
In this script, I open and work with three text files and store the contents in a hash. The files are large (close to 3 MB).
I am using a loop to go through each of the files (open -> read -> Do some action (hash table) -> close)
I am noticing that the whenever I am scanning through the first file, the Perl terminal window in my Cygwin shell gets stuck. The moment I hit the enter key I can see the script process the rest of the files without any issues.
It's very odd as there is no read from STDIN in my script. Moreover, the same logic applies to all the three files as everything is in the same loop.
Has anyone here faced a similar issue? Does this usually happen when dealing with large files or big hashes?
I can't post the script here, but there is not much in it to post anyway.
Could this just be a problem in my Cygwin shell?
If this problem does not go away, how can I circumvent it? Like providing the enter input when the script is in progress? More importantly, how can I debug such a problem?
sub read_set
{
#lines_in_set = ();
push #lines_in_set , $_[0];
while (<INPUT_FILE>)
{ $line = $_;
chomp($line);
if ($line=~ /ENDNEWTYPE/i or $line =~ /ENDSYNTYPE/ or eof())
{
push #lines_in_set , $line;
last;
}
else
{
push #lines_in_set , $line;
}
}
return #lines_in_set;
}
--------> I think i found the problem :- or eof() call was ensuring that the script would be stuck !! Somehow happening only at the first time. I have no idea why though
The eof() call is the problem. See perldoc -f eof.
eof with empty parentheses refers to the pseudo file accessed via while (<>), which consists of either all the files named in #ARGV, or to STDIN if there are none.
And in particular:
Note that this function actually reads a character and then "ungetc"s it, so isn't useful in an interactive context.
But your loop reads from another handle, one called INPUT_FILE.
It would make more sense to call eof(INPUT_FILE). But even that probably isn't necessary; your outer loop will terminate when it reaches the end of INPUT_FILE.
Some more suggestions, not related to the symptoms you're seeing:
Add
use strict;
use warnings;
near the top of your script, and correct any error messages this produces (perl -cw script-name does a compile-only check). You'll need to declare your variables using my (perldoc -f my). And use consistent indentation; I recommend the same style you'll find in most Perl documentation.

How can I disable Perl warnings for svnnotify?

I'm using svnnotify. It works (sends email and all that) but it always outputs some error messages, such as
Use of uninitialized value in substr at /usr/lib/perl5/site_perl/5.8.8/SVN/Notify.pm line 1313.
substr outside of string at /usr/lib/perl5/site\_perl/5.8.8/SVN/Notify.pm line 1313.
Use of uninitialized value in index at /usr/lib/perl5/site\_perl/5.8.8/SVN/Notify.pm line 1313.
Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/site\_perl/5.8.8/SVN/Notify.pm line 1314.
I've tried running it with > /dev/null but no luck. I've tried running it > bla and file bla comes up empty, and the output is shown on screen. Since svnnotify does not have a quiet switch, how can I do this?
It looks like this happens when there is no log message for the commit. Some users may need to have a LART session to fix that. Barring that, the right thing to do is fix the SVN::Notify module and submit the patch to the SVN::Notify RT queue, but too late, I've already submitted the ticket.
Here's the patch:
diff --git a/lib/SVN/Notify.pm b/lib/SVN/Notify.pm
index 3f3672b..5387dd2 100644
--- a/lib/SVN/Notify.pm
+++ b/lib/SVN/Notify.pm
## -1308,7 +1308,7 ## sub prepare_subject {
}
# Add the first sentence/line from the log message.
- unless ($self->{no\_first\_line}) {
+ if (!$self->{no\_first\_line} and defined $self->{message}[0] and length $self->{message}[0] )
# Truncate to first period after a minimum of 10 characters.
my $i = index substr($self->{message}[0], 10), '. ';
$self->{subject} .= $i > 0
I think that takes care of the warnings, but I suspect it's only a band-aid because the design around $self->{no\_first\_line} seems dodgy.
While the accepted answer certainly works, it isn't (in my opinion) the best answer, because one day you're going to have to maintain this code (or worse, someone else will), and getting it to run with no warnings will make your code that much better. Given the warnings, it looks like the problem isn't you, but it's hard (but not impossible) to imagine that a module in version 2.79 would give errors. Maybe your copy of SVN::Notify is old? Maybe your code (or someone's code) is mucking with the module's internals? It's hard to tell with the few warnings I can see, and I have a feeling there's a bit more to this problem than meets the eye.
The error is printed on stderr, you can redirect it to /dev/null with 2> /dev/null. Alternatively you can make it not use perl -w.

Why is 'last' called 'last' in Perl?

What is the historical reason to that last is called that in Perl rather than break as it is called in C?
The design of Perl was influenced by C (in addition to awk, sed and sh - see man page below), so there must have been some reasoning behind not going with the familiar C-style naming of break/last.
A bit of history from the Perl 1.000 (released 18 December, 1987) man page:
[Perl] combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC|PLUS.)
The semantics of 'break' or 'last' are
defined by the language (in this case
Perl), not by you.
Why not think of 'last' as "this is
the last statement to run for the
loop".
It's always struck me as odd that the
'continue' statement in 'C' starts the
next pass of a loop. This is
definitely a strange use of the
concept of "continue". But it is the
semantics of 'C', so I accept it.
By trying to map particular
programming concepts into single
English words with existing meaning
there is always going to be some sort
of mismatching oddity
Source
Plus, Larry Wall is kinda weird. Have you seen his picture?
(source: wired.com)
I expect that this is because Perl was created by a linguist, not a computer scientist. In normal English usage, the concept of declaring that you have completed your final pass through a loop is more strongly connected to the word "last" ("this is the last pass") than to the word "break" ("break the loop"? "break out of the loop"? - it's not even clear how "break" is intended to relate to exiting the loop).
The term 'last' makes more sense when you remember that you can use it with more than just the immediate looping control. You can apply it to labeled blocks one or more levels above
the block it is in:
LINE: while( <> ) {
WORD: foreach ( split ) {
last LINE if /^__END__\z/;
...
}
}
It reads more naturally to say "last" in english when you read it as "last line if it matches ...".
Theres an additional reason you might want to consider:
Last does more than just loop control.
sub hello {
my ( $arg ) = #_;
scope: {
foo();
bar();
last if $arg > 4;
baz();
quux();
}
}
Last as such is a general flow control mechanism not limited to loops. While of course, you can generalise the above as a loop that runs at most 1 times, the absence of a loop to me indicates "Break? What are we breaking out of?"
Instead, I think of "last" as "Jump to the position of the last brace", which is for this purpose, more semantically sensible.
I was asking the same question to Damian Conway about say. Perl 6 will introduce say, which is nothing more than print that automatically adds a newline. My question was why not simply use echo, because this is what echo does in Bash (and probably elsewhere).
His answer was: echo is 33% longer than say.
He has a point there. :)
Because it goes to the last of the loop. And because Larry Wall was a weird guy.

How can I eval environment variables in Perl?

I would like to evaluate an environment variable and set the result to a variable:
$x=eval($ENV{EDITOR});
print $x;
outputs:
/bin/vi
works fine.
If I set an environment variable QUOTE to \' and try the same thing:
$x=eval($ENV{QUOTE});
print $x;
outputs:
(nothing)
$# set to: "Can't find a string terminator anywhere before ..."
I do not wish to simply set $x=$ENV{QUOTE}; as the eval is also used to call a script and return its last value (very handy), so I would like to stick with the eval(); Note that all of the Environment variables eval'ed in this manner are set by me in a different place so I am not concerned with malicious access to the environment variables eval-ed in this way.
Suggestions?
Well, of course it does nothing.
If your ENV varaible contains text which is half code, but isn't and you give the resulting string to something that evaluates that code as Perl, of course it's not going to work.
You only have 3 options:
Programmatically process the string so it doesn't have invalid syntax in it
Manually make sure your ENV variables are not rubbish
Find a solution not involving eval but gives the right result.
You may as well complain that
$x = '
Is not valid code, because that's essentially what's occurring.
Samples of Fixing the value of 'QUOTE' to work
# Bad.
QUOTE="'" perl -wWe 'print eval $ENV{QUOTE}; print "$#"'
# Can't find string terminator "'" anywhere before EOF at (eval 1) line 1.
# Bad.
QUOTE="\'" perl -wWe 'print eval $ENV{QUOTE}; print "$#"'
# Can't find string terminator "'" anywhere before EOF at (eval 1) line 1.
# Bad.
QUOTE="\\'" perl -wWe 'print eval $ENV{QUOTE}; print "$#"'
# Can't find string terminator "'" anywhere before EOF at (eval 1) line 1.
# Good
QUOTE="'\''" perl -wWe 'print eval $ENV{QUOTE}; print "$#"'
# '
Why are you eval'ing in the first place? Should you just say
my $x = $ENV{QUOTE};
print "$x\n";
The eval is executing the string in $ENV{QUOTE} as if it were Perl code, which I certainly hope it isn't. That is why \ disappears. If you were to check the $# variable you would find an error message like
syntax error at (eval 1) line 2, at EOF
If you environment variables are going to contain code that Perl should be executing then you should look into the Safe module. It allows you to control what sort of code can execute in an eval so you don't accidentally wind up executing something like "use File::Find; find sub{unlink $File::Find::file}, '.'"
Evaluating an environment value is very dangerous, and would generate errors if running under taint mode.
# purposely broken
QUOTE='`rm system`'
$x=eval($ENV{QUOTE});
print $x;
Now just imagine if this script was running with root access, and was changed to actually delete the file system.
Kent's answer, while technically correct, misses the point. The solution is not to use eval better, but to not use eval at all!
The crux of this problem seems to be in understanding what eval STRING does (there is eval BLOCK which is completely different despite having the same name). It takes a string and runs it as Perl code. 99.99% this is unnecessary and dangerous and results in spaghetti code and you absolutely should not be using it so early in your Perl programming career. You have found the gun in your dad's sock drawer. Discovering that it can blow holes in things you are now trying to use it to hang a poster. It's better to forget it exists, your code will be so much better for it.
$x = eval($ENV{EDITOR}); does not do what you think it does. I don't even have to know what you think it does, that you even used it there means you don't know. I also know that you're running with warnings off because Perl would have screamed at you for that. Why? Let's assume that EDITOR is set to /bin/vi. The above is equivalent to $x = /bin/vi which isn't even valid Perl code.
$ EDITOR=/bin/vi perl -we '$x=eval($ENV{EDITOR}); print $x'
Bareword found where operator expected at (eval 1) line 1, near "/bin/vi"
(Missing operator before vi?)
Unquoted string "vi" may clash with future reserved word at (eval 1) line 2.
Use of uninitialized value $x in print at -e line 1.
I'm not sure how you got it to work in the first place. I suspect you left something out of your example. Maybe tweaking EDITOR until it worked?
You don't have to do anything magical to read an environment variable. Just $x = $ENV{EDITOR}. Done. $x is now /bin/vi as you wanted. It's just the same as $x = $y. Same thing with QUOTE.
$ QUOTE=\' perl -wle '$x=$ENV{QUOTE}; print $x'
'
Done.
Now, I suspect what you really want to do is run that editor and use that quote in some shell command. Am I right?
Well, you could double-escape the QUOTE's value, I guess, since you know that it's going to be evaled.
Maybe what you want is not Perl's eval but to evaluate the environment variable as the shell would. For this, you want to use backticks.
$x = `$ENV{QUOTE}`