Perl: Speeding up eval - perl

eval is slow when done on a string: The string first has to be parsed before it can be executed.
I am looking for a way to cache the parsing, so that I can reuse the parsed string for yet another eval. The next eval will be the same code, but will not eval to the same value, so I cannot simply cache the results.
From the description I am looking for ceval from Eval::Compile.
But I cannot use Eval::Compile, as that requires a C compiler for the platform, and it is not given that the user has a C compiler.
So can I do something similar to ceval in pure Perl?
Background
GNU Parallel lets the user give Perl expressions that will be eval'ed on every argument. Currently the Perl expressions are given as strings by the user and eval'ed for every argument. The Perl expressions remain unchanged for each argument. It is therefore a waste to recompile the expression as the recompilation will not change anything.
Profiling of the code shows that the eval is one of the bottlenecks.
Example
The user enters: $_ .= "foo" and s/a/b/g
A user's scripts are stored in $usereval1 and $usereval2.
The user gives 10000 random arguments (strings) stored in #arguments.
sub replace {
my ($script, $arg) = #_;
local $_;
$_ = $arg;
# This is where I would like to cache the parsed $script.
eval $script;
return $_;
}
for my $arg (#arguments) {
# Loads of indirect code (in the order of 1000 lines) that
# call subs calling subs calling subs that eventually does:
$replaced1 = replace($usereval1, $arg);
$replaced2 = replace($usereval2, $arg);
# Yet more code that does stuff with $replaced1 $replaced2
}

You can store a subroutine ref like this:
perl -lwe 'my $x = eval(q( sub { my $foo = shift; $foo*2; } )); print $x->(12);'
This prints 24. You can reuse the code without the need to recompile it.

Related

Does perl cache regex generation?

Suppose I have a function that dynamically generates regular expressions and then matches against them.
For example, in the following function match_here a \G anchor is inserted at the beginning of the regex. This simplifies the API because the caller does not need to remember to include the pos anchor in the pattern.
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
use Data::Dumper;
sub match_here {
my ($str, $index, $rg) = #_;
pos($str) = $index;
croak "index ($index) out of bounds" unless pos($str) == $index;
my $out;
if ($str =~ /\G$rg/) {
$out = $+[0];
}
return $out;
}
# no match starting at position 0
# prints '$VAR1 = undef;'
print Dumper(match_here("abc", 0, "b+"));
# match from 1 to 2
# prints '$VAR1 = 2;'
print Dumper(match_here("abc", 1, "b+"));
I'm wondering whether an anonymous regex object is "compiled" every time the function is evaluated or if there's some caching so that identical strings will not cause additional regex objects to be compiled.
Also, assuming that no caching is done by the Perl interpreter, is compiling a regex object expensive enough to be worth caching (possibly in an XS extension)?
From perlop(1), under the m// operator:
PATTERN may contain variables, which will be interpolated every time the pattern search is evaluated
[...]
Perl will not recompile the pattern unless an interpolated variable that it contains changes. You can force Perl to skip the test and never recompile by adding a "/o" (which stands for "once") after the trailing delimiter. Once upon a time, Perl would recompile regular expressions unnecessarily, and this modifier was useful to tell it not to do so, in the interests of speed.
So yes, there is a cache, and you can even force the use of the cache even when it's invalid by saying /o, but you really shouldn't do that.
But that cache only stores one compiled regexp per instance of the m// or s/// operator, so it only helps if the regexp is used with the same variables (e.g. your $rg) many times consecutively. If you alternate between calling it with $rg='b+' and $rg='c+' you will get a recompile every time.
For that kind of situation, you can do your own caching with the qr// operator. It explicitly compiles the regexp and returns an object that you can store and use to execute the regexp later. That could be incorporated into your match_here like this:
use feature 'state';
sub match_here {
my ($str, $index, $rg) = #_;
pos($str) = $index;
croak "index ($index) out of bounds" unless pos($str) == $index;
my $out;
state %rg_cache;
my $crg = $rg_cache{$rg} ||= qr/\G$rg/;
if ($str =~ /$crg/) {
$out = $+[0];
}
return $out;
}
To add more detail on the basic cache (when not using qr//): the fact that $rg is a newly allocated lexical variable each time makes no difference. It only matters that the value is the same as the previous one.
Here's an example to prove the point:
use re qw(Debug COMPILE);
while(<>) {
chomp;
# Insane interpolation. Do not use anything remotely like this in real code
print "MATCHED: $_\n" if /^${\(`cat refile`)}/;
}
Every time the match operator executes, it reads refile. The regular expression is ^ followed by the contents of refile. The debugging output shows that it is recompiled only if the contents of the file have changed. If the file still has the same contents as the last time, the operator notices that the same string is being passed to the regexp compiler again, and reuses the cached result.
Or try this less dramatic example:
use re qw(Debug COMPILE);
#patterns = (
'\d{3}',
'\d{3}',
'[aeiou]',
'[aeiou]',
'\d{3}',
'\d{3}'
);
for ('xyz', '123', 'other') {
for $i (0..$#patterns) {
if(/$patterns[$i]/) {
print "$_ matches $patterns[$i]\n";
} else {
print "$_ does not match $patterns[$i]\n";
}
}
}
in which there are 18 compilations and 11 of them are cache hits, even though the same "variable" (the same element of the #patterns array) is never used twice in a row.

perl a user supplied sub, always returns the text, not the eval

I am a perl newb.
Unfortunately, eval doesn't work the way I'm expecting
My mock example:
my $filter = 'return false;';
my $filterFunc = sub{ eval{ $filter } } ;
while (my $entry = readNextEntry()){
print "Eval:".$filterFunc->( $entry )."\n";
}
When I run this, I get the literal "return false;" being returned from every pass (rather than the function getting evaluated). I've tried a couple of variations, but I'm not hitting the magic combination.
A note on security implications:
As I am the user, I don't have security concerns about what the passed in code does.
My intent is to pull in multiple sources and filter out stuff based on parameters, since I don't know what parameters are going to be useful, I thought I could pass in some text, eval the text into an anonymous function, and run the function for each record (a filter function). As I am the user, I don't have security concerns about what the passed in code does.
You need to do string eval, not block eval.
my $filterFunc = sub{ eval{ $filter } } ;
The eval BLOCK is like a try/catch mechanism. It does not evaluate arbitrary code. It catches errors in the code inside the block. What you want is string eval.
my $filterFunc = sub{ eval $filter };
Here's an example implementation of what I think you are trying to do:
$ perl -E 'my $filter = sub { my $f = shift; eval $ARGV[0]; }; for ( 1 .. 10 ) { say $_ if $filter->($_) }' '$f % 2'
1
3
5
7
9
However, there is no false in Perl. That's not a keyword. Unless you have sub false somewhere, this might give you a syntax error, depending on if you have use strict or not.
You should read up on eval.
If all you want is a $filterFunc that returns something false, use 0 instead. Note that the literal string "return false;" is true in Perl.

2 Sub references as arguments in perl

I have perl function I dont what does it do?
my what does min in perl?
#ARVG what does mean?
sub getArgs
{
my $argCnt=0;
my %argH;
for my $arg (#ARGV)
{
if ($arg =~ /^-/) # insert this entry and the next in the hash table
{
$argH{$ARGV[$argCnt]} = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;}
Code like that makes David sad...
Here's a reformatted version of the code doing the indentations correctly. That makes it so much easier to read. I can easily tell where my if and loops start and end:
sub getArgs {
my $argCnt = 0;
my %argH;
for my $arg ( #ARGV ) {
if ( $arg =~ /^-/ ) { # insert this entry and the next in the hash table
$argH{ $ARGV[$argCnt] } = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;
}
The #ARGV is what is passed to the program. It is an array of all the arguments passed. For example, I have a program foo.pl, and I call it like this:
foo.pl one two three four five
In this case, $ARGV is set to the list of values ("one", "two", "three", "four", "five"). The name comes from a similar variable found in the C programming language.
The author is attempting to parse these arguments. For example:
foo.pl -this that -the other
would result in:
$arg{"-this"} = "that";
$arg{"-the"} = "other";
I don't see min. Do you mean my?
This is a wee bit of a complex discussion which would normally involve package variables vs. lexically scoped variables, and how Perl stores variables. To make things easier, I'm going to give you a sort-of incorrect, but technically wrong answer: If you use the (strict) pragma, and you should, you have to declare your variables with my before they can be used. For example, here's a simple two line program that's wrong. Can you see the error?
$name = "Bob";
print "Hello $Name, how are you?\n";
Note that when I set $name to "Bob", $name is with a lowercase n. But, I used $Name (upper case N) in my print statement. As it stands, now. Perl will print out "Hello, how are you?" without a care that I've used the wrong variable name. If it's hard to spot an error like this in a two line program, imagine what it would be like in a 1000 line program.
By using strict and forcing me to declare variables with my, Perl can catch that error:
use strict;
use warnings; # Another Pragma that should always be used
my $name = "Bob";
print "Hello $Name, how are you doing\n";
Now, when I run the program, I get the following error:
Global symbol "$Name" requires explicit package name at (line # of print statement)
This means that $Name isn't defined, and Perl points to where that error is.
When you define variables like this, they are in scope with in the block where it's defined. A block could be the code contained in a set of curly braces or a while, if, or for statement. If you define a variable with my outside of these, it's defined to the end of the file.
Thus, by using my, the variables are only defined inside this subroutine. And, the $arg variable is only defined in the for loop.
One more thing:
The person who wrote this should have used the Getopt::Long module. There's a major bug in their code:
For example:
foo.pl -this that -one -two
In this case, my hash looks like this:
$args{'-this'} = "that";
$args{'-one'} = "-two";
$args{'-two'} = undef;
If I did this:
if ( defined $args{'-two'} ) {
...
}
I would not execute the if statement.
Also:
foo.pl -this=that -one -two
would also fail.
#ARGV is a special variable (refer to perldoc perlvar):
#ARGV
The array #ARGV contains the command-line arguments intended for the
script. $#ARGV is generally the number of arguments minus one, because
$ARGV[0] is the first argument, not the program's command name itself.
See $0 for the command name.
Perl documentation is also available from your command line:
perldoc -v #ARGV

Perl using the special character &

I had a small question. I was reading some code and as my school didn't teach me anything useful about perl programming, I am here to ask you people. I see this line being used a lot in some perl programs:
$variable = &something();
I don't know what the & sign means here as I never say it in perl. And the something is a subroutine ( I am guessing). It usually says a name and it has arguments like a function too sometimes. Can someone tell me what & stands for here and what that something is all the time.
The variable takes in some sort of returned value and is then used to check some conditions, which makes me think it is a subroutine. But still why the &?
Thanks
Virtually every time you see & outside of \&foo and EXRP && EXPR, it's an error.
&foo(...) is the same as foo(...) except foo's prototype will be ignored.
sub foo(&#) { ... } # Cause foo to takes a BLOCK as its first arg
foo { ... } ...;
&foo(sub { ... }, ...); # Same thing.
Only subroutines (not operators) will be called by &foo(...).
sub print { ... }
print(...); # Calls the print builtin
&print(...); # Calls the print sub.
You'll probably never need to use this feature in your entire programming career. If you see it used, it's surely someone using & when they shouldn't.
&foo is similar to &foo(#_). The difference is that changes to #_ in foo affects the current sub's #_.
You'll probably never need to use this feature in your entire programming career. If you see it used, it's surely someone using & when they shouldn't or a foolish attempt at optimization. However, the following is pretty elegant:
sub log_info { unshift #_, 'info'; &log }
sub log_warn { unshift #_, 'warn'; &log }
sub log_error { unshift #_, 'error'; &log }
goto &foo is similar to &foo, except the current subroutine is removed from the call stack first. This will cause it to not show up in stack traces, for example.
You'll probably never need to use this feature in your entire programming career. If you see it used, it's surely a foolish attempt at optimization.
sub log_info { unshift #_, 'info'; goto &log; } # These are slower than
sub log_warn { unshift #_, 'warn'; goto &log; } # not using goto, but maybe
sub log_error { unshift #_, 'error'; goto &log; } # maybe log uses caller()?
$& contains what the last regex expression match matched. Before 5.20, using this causes every regex in your entire interpreter to become slower (if they have no captures), so don't use this.
print $& if /fo+/; # Bad before 5.20
print $MATCH if /fo+/; # Bad (Same thing. Requires "use English;")
print ${^MATCH} if /fo+/p; # Ok (Requires Perl 5.10)
print $1 if /(fo+)/; # Ok
defined &foo is a perfectly legitimate way of checking if a subroutine exists, but it's not something you'll likely ever need. There's also exists &foo is similar, but not as useful.
EXPR & EXPR is the bitwise AND operator. This is used when dealing with low-level systems that store multiple pieces of information in a single word.
system($cmd);
die "Can't execute command: $!\n" if $? == -1;
die "Child kill by ".($? & 0x7F)."\n" if $? & 0x7F;
die "Child exited with ".($? >> 8)."\n" if $? >> 8;
&{ EXPR }() (and &$ref()) is a subroutine call via a reference. This is a perfectly acceptable and somewhat common thing to do, though I prefer the $ref->() syntax. Example in next item.
\&foo takes a reference to subroutine foo. This is a perfectly acceptable and somewhat common thing to do.
my %dispatch = (
foo => \&foo,
bar => \&bar,
);
my $handler = $dispatch{$cmd} or die;
$handler->();
# Same: &{ $handler }();
# Same: &$handler();
EXPR && EXPR is the boolean AND operator. I'm sure you're familiar with this extremely common operator.
if (0 <= $x && $x <= 100) { ... }
In older versions of perl & was used to call subroutines. Now this is not necessary and \& is mostly used to take a reference to subroutine,
my $sub_ref = \&subroutine;
or to ignore function prototype (http://perldoc.perl.org/perlsub.html#Prototypes)
Other than for referencing subroutines & is bitwise and operator,
http://perldoc.perl.org/perlop.html#Bitwise-And

Perl - How to create commands that users can input in console?

I'm just starting in Perl and I'm quite enjoying it. I'm writing some basic functions, but what I really want to be able to do is to use those functions intelligently using console commands. For example, say I have a function adding two numbers. I'd want to be able to type in console "add 2, 4" and read the first word, then pass the two numbers as parameters in an "add" function. Essentially, I'm asking for help in creating some basic scripting using Perl ^^'.
I have some vague ideas about how I might do this in VB, but Perl, I have no idea where I'd start, or what functions would be useful to me. Is there something like VB.net's "Split" function where you can break down the contents of a scalar into an array? Is there a simple way to analyse one word at a time in a scalar, or iterate through a scalar until you hit a separator, for example?
I hope you can help, any suggestions are appreciated! Bear in mind, I'm no expert, I started Perl all of a few weeks ago, and I've only been doing VB.net half a year.
Thank you!
Edit: If you're not sure what to suggest and you know any simple/intuitive resources that might be of help, that would also be appreciated.
Its rather easy to make a script which dispatches to a command by name. Here is a simple example:
#!/usr/bin/env perl
use strict;
use warnings;
# take the command name off the #ARGV stack
my $command_name = shift;
# get a reference to the subroutine by name
my $command = __PACKAGE__->can($command_name) || die "Unknown command: $command_name\n";
# execute the command, using the rest of #ARGV as arguments
# and print the return with a trailing newline
print $command->(#ARGV);
print "\n";
sub add {
my ($x, $y) = #_;
return $x + $y;
}
sub subtract {
my ($x, $y) = #_;
return $x - $y;
}
This script (say its named myscript.pl) can be called like
$ ./myscript.pl add 2 3
or
$ ./myscript.pl subtract 2 3
Once you have played with that for a while, you might want to take it further and use a framework for this kind of thing. There are several available, like App::Cmd or you can take the logic shown above and modularize as you see fit.
You want to parse command line arguments. A space serves as the delimiter, so just do a ./add.pl 2 3 Something like this:
$num1=$ARGV[0];
$num2=$ARGV[1];
print $num1 + $num2;
will print 5
Here is a short implementation of a simple scripting language.
Each statement is exactly one line long, and has the following structure:
Statement = [<Var> =] <Command> [<Arg> ...]
# This is a regular grammar, so we don't need a complicated parser.
Tokens are seperated by whitespace. A command may take any number of arguments. These can either be the contents of variables $var, a string "foo", or a number (int or float).
As these are Perl scalars, there is no visible difference between strings and numbers.
Here is the preamble of the script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
strict and warnings are essential when learning Perl, else too much weird stuff would be possible. The use 5.010 is a minimum version, it also defines the say builtin (like a print but appends a newline).
Now we declare two global variables: The %env hash (table or dict) associates variable names with their values. %functions holds our builtin functions. The values are anonymous functions.
my %env;
my %functions = (
add => sub { $_[0] + $_[1] },
mul => sub { $_[0] * $_[1] },
say => sub { say $_[0] },
bye => sub { exit 0 },
);
Now comes our read-eval-loop (we don't print by default). The readline operator <> will read from the file specified as the first command line argument, or from STDIN if no filename is provided.
while (<>) {
next if /^\s*\#/; # jump comment lines
# parse the line. We get a destination $var, a $command, and any number of #args
my ($var, $command, #args) = parse($_);
# Execute the anonymous sub specified by $command with the #args
my $value = $functions{ $command }->(#args);
# Store the return value if a destination $var was specified
$env{ $var } = $value if defined $var;
}
That was fairly trivial. Now comes some parsing code. Perl “binds” regexes to strings with the =~ operator. Regexes may look like /foo/ or m/foo/. The /x flags allows us to include whitespace in our regex that doesn't match actual whitespace. The /g flag matches globally. This also enables the \G assertion. This is where the last successful match ended. The /c flag is important for this m//gc style parsing to consume one match at a time, and to prevent the position of the regex engine in out string to being reset.
sub parse {
my ($line) = #_; # get the $line, which is a argument
my ($var, $command, #args); # declare variables to be filled
# Test if this statement has a variable declaration
if ($line =~ m/\G\s* \$(\w+) \s*=\s* /xgc) {
$var = $1; # assign first capture if successful
}
# Parse the function of this statement.
if ($line =~ m/\G\s* (\w+) \s*/xgc) {
$command = $1;
# Test if the specified function exists in our %functions
if (not exists $functions{$command}) {
die "The command $command is not known\n";
}
} else {
die "Command required\n"; # Throw fatal exception on parse error.
}
# As long as our matches haven't consumed the whole string...
while (pos($line) < length($line)) {
# Try to match variables
if ($line =~ m/\G \$(\w+) \s*/xgc) {
die "The variable $1 does not exist\n" if not exists $env{$1};
push #args, $env{$1};
}
# Try to match strings
elsif ($line =~ m/\G "([^"]+)" \s*/xgc) {
push #args, $1;
}
# Try to match ints or floats
elsif ($line =~ m/\G (\d+ (?:\.\d+)? ) \s*/xgc) {
push #args, 0+$1;
}
# Throw error if nothing matched
else {
die "Didn't understand that line\n";
}
}
# return our -- now filled -- vars.
return $var, $command, #args;
}
Perl arrays can be handled like linked list: shift removes and returns the first element (pop does the same to the last element). push adds an element to the end, unshift to the beginning.
Out little programming language can execute simple programs like:
#!my_little_language
$a = mul 2 20
$b = add 0 2
$answer = add $a $b
say $answer
bye
If (1) our perl script is saved in my_little_language, set to be executable, and is in the system PATH, and (2) the above file in our little language saved as meaning_of_life.mll, and also set to be executable, then
$ ./meaning_of_life
should be able to run it.
Output is obviously 42. Note that our language doesn't yet have string manipulation or simple assignment to variables. Also, it would be nice to be able to call functions with the return value of other functions directly. This requires some sort of parens, or precedence mechanism. Also, the language requires better error reporting for batch processing (which it already supports).