Evaluating escape sequences in perl

Evaluating escape sequences in perl - perl

I'm reading strings from a file. Those strings contain escape sequences which I would like to have evaluated before processing them further. So I do:
$t = eval("\"$t\"");
which works fine. But I'm having doubt about the performance. If eval is forking a perl process each time, it will be a performance killer. Another way I considered to do the job were regex, where I have found related questions in SO.
My question: is there a better, more efficient way to do it?
EDIT: before calling eval in my example $t is containing \064\065\x20a\n. It is evaluated to 45 a<LF>.

It's not quite clear what the strings in the file look like and what you do to them before passing off to eval. There's something missing in the explanation.
If you simply want to undo C-style escaping (as also used in Perl), use Encode::Escape:
use Encode qw(decode);
use Encode::Escape qw();
my $string_with_unescaped_literals = decode 'ascii-escape', $string_with_escaped_literals;
If you have placeholders in the file which look like Perl variables that you want to fill with values, then you are abusing eval as a poor man's templating engine. Use a real one that does not have the dangerous side effect of running arbitrary code.

$string =~ s/\\([rnt'"\\])/"qq|\\$1|"/gee
string eval can solve the problem too, but it brings up a host of security and maintenance issues, like # in string

oh gah don't use eval for this, thats dangerous if someone provides it with input like "system('sync;reboot');"..
But, you could do something like this:
#!/usr/bin/perl
$string = 'foo\"ba\\\'r';
printf("%s\n", $string);
$string =~ s/\\([\"\'])/$1/g;
printf("%s\n", $string);

Related

Should I use parenthesis in postfix `if` syntax?

Are there any benefits to use parenthesis in postfix if statement? like:
return 1 if ($self->found());
One of my workmate stick to use parenthesis and It's not common for me. I would appreciate if someone point out differences.
Following is output from perl -MO=Deparse
#!/usr/bin/perl
use strict;
my $found = 1;
print "FOUND\n" if $found;
print "FOUND\n" if ($found);
__END__
test.pl syntax OK
use strict;
my $found = 1;
print "FOUND\n" if $found;
print "FOUND\n" if $found;

Parens are never needed around the condition in the if statement modifier, so consistency and playing-it-safe aren't factors. (Same goes for the other statement modifiers.)
I don't see any advantage to using them. In fact, I find them a bit jarring.
I believe most experienced Perl programmers don't use them, to the point that it's a fair indicator of "newbieness".

It's mainly a matter of style. There's no requirement to use parens around the entire conditional expression and it has no effect on the way the code behaves, so decide for yourself what standard you want to use. Personally, I don't, and my impression is that most Perl programmers don't either, but that could just be selection bias.
I will note, though, that any time I start wanting to use parens within the conditional expression, that's a pretty good sign that it shouldn't be in a postfix clause, so then I put parens around the whole thing and change it to the if (...) {...} form.

If you want to use Perl Best Practices, you should only postfix if for control of flow (break, next, last). When you use postfix if you should probably have easy enough if-statements to not need parenthesis.
For If blocks I think the best practice is to use parenthesis.

I usually use the parentheses because more often than not I guess wrong on how many statements I need to be conditional. Then I have to rewrite the line to become a proper if (condition) { BLOCK }. Having the parentheses already there make this slightly easier.
There's also the mental/readability benefit of having all if statements look the same.

Why is the perl print filehandle syntax the way it is?

I am wondering why the perl creators chose an unusual syntax for printing to a filehandle:
print filehandle list
with no comma after filehandle. I see that it's to distinguish between "print list" and "print filehandle list", but why was the ad-hoc syntax preferred over creating two functions - one to print to stdout and one to print to given filehandle?
In my searches, I came across the explanation that this is an indirect object syntax, but didn't the print function exist in perl 4 and before, whereas the object-oriented features came into perl relatively late? Is anyone familiar with the history of print in perl?

Since the comma is already used as the list constructor, you can't use it to separate semantically different arguments to print.
open my $fh, ...;
print $fh, $foo, $bar
would just look like you were trying to print the values of 3 variables. There's no way for the parser, which operates at compile time, to tell that $fh is going to refer to a file handle at run time. So you need a different character to syntactically (not semantically) distinguish between the optional file handle and the values to actually print to that file handle.
At this point, it's no more work for the parser to recognize that the first argument is separated from the second argument by blank space than it would be if it were separated by any other character.

If Perl had used the comma to make print look more like a function, the filehandle would always have to be included if you are including anything to print besides $_. That is the way functions work: If you pass in a second parameter, the first parameter must also be included. There isn't one function I can think of in Perl where the first parameter is optional when the second parameter exists. Take a look at split. It can be written using zero to four parameters. However, if you want to specify a <limit>, you have to specify the first three parameters too.
If you look at other languages, they all include two different ways ways to print: One if you want STDOUT, and another if you're printing to something besides STDOUT. Thus, Python has both print and write. C has both printf and fprintf. However, Perl can do this with just a single statement.
Let's look at the print statement a bit more closely -- thinking back to 1987 when Perl was first written.
You can think of the print syntax as really being:
print <filehandle> <list_to_print>
To print to OUTFILE, you would say:
To print to this file, you would say:
print OUTFILE "This is being printed to myfile.txt\n";
The syntax is almost English like (PRINT to OUTFILE the string "This is being printed to myfile.txt\n"
You can also do the same with thing with STDOUT:
print STDOUT "This is being printed to your console";
print STDOUT " unless you redirected the output.\n";
As a shortcut, if the filehandle was not given, it would print to STDOUT or whatever filehandle the select was set to.
print "This is being printed to your console";
print " unless you redirected the output.\n";
select OUTFILE;
print "This is being printed to whatever the filehandle OUTFILE is pointing to\n";
Now, we see the thinking behind this syntax.
Imagine I have a program that normally prints to the console. However, my boss now wants some of that output printed to various files when required instead of STDOUT. In Perl, I could easily add a few select statements, and my problems will be solved. In Python, Java, or C, I would have to modify each of my print statements, and either have some logic to use a file write to STDOUT (which may involve some conniptions in file opening and dupping to STDOUT.
Remember that Perl wasn't written to be a full fledge language. It was written to do the quick and dirty job of parsing text files more easily and flexibly than awk did. Over the years, people used it because of its flexibility and new concepts were added on top of the old ones. For example, before Perl 5, there was no such things as references which meant there was no such thing as object oriented programming. If we, back in the days of Perl 3 or Perl 4 needed something more complex than the simple list, hash, scalar variable, we had to munge it ourselves. It's not like complex data structures were unheard of. C had struct since its initial beginnings. Heck, even Pascal had the concept with records back in 1969 when people thought bellbottoms were cool. (We plead insanity. We were all on drugs.) However, since neither Bourne shell nor awk had complex data structures, so why would Perl need them?

Answer to "why" is probably subjective and something close to "Larry liked it".
Do note however, that indirect object notation is not a feature of print, but a general notation that can be used with any object or class and method. For example with LWP::UserAgent.
use strict;
use warnings;
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
my $response = get $ua "http://www.google.com";
my $response_content = decoded_content $response;
print $response_content;
Any time you write method object, it means exactly the same as object->method. Note also that parser seems to only reliably work as long as you don't nest such notations or do not use complex expressions to get object, so unless you want to have lots of fun with brackets and quoting, I'd recommend against using it anywhere except common cases of print, close and rest of IO methods.

Why not? it's concise and it works, in perl's DWIM spirit.
Most likely it's that way because Larry Wall liked it that way.

I serialized my data in Perl with Data::Dumper. Now when I eval it I get "Global symbol "$VAR1" requires explicit package name"

I serialized my data to string in Perl using Data::Dumper. Now in another program I'm trying to deserialize it by using eval and I'm getting:
Global symbol "$VAR1" requires explicit package name
I'm using use warnings; use strict; in my program.
Here is how I'm evaling the code:
my $wiki_categories = eval($db_row->{categories});
die $# if $#;
/* use $wiki_categories */
How can I disable my program dying because of "$VAR1" not being declared as my?
Should I append "my " before the $db_row->{categories} in the eval? Like this:
my $wiki_categories = eval("my ".$db_row->{categories});
I didn't test this yet, but I think it would work.
Any other ways to do this? Perhaps wrap it in some block, and turn off strict for that block? I haven't ever done it but I've seen it mentioned.
Any help appreciated!

This is normal. By default, when Data::Dumper serializes data, it outputs something like:
$VAR1 = ...your data...
To use Data::Dumper for serialization, you need to configure it a little. Terse being the most important option to set, it turns off the $VAR thing.
use Data::Dumper;
my $data = {
foo => 23,
bar => [qw(1 2 3)]
};
my $dumper = Data::Dumper->new([]);
$dumper->Terse(1);
$dumper->Values([$data]);
print $dumper->Dump;
Then the result can be evaled straight into a variable.
my $data = eval $your_dump;
You can do various tricks to shrink the size of Data::Dumper, but on the whole it's fast and space efficient. The major down sides are that it's Perl only and wildly insecure. If anyone can modify your dump file, they own your program.
There are modules on CPAN which take care of this for you, and a whole lot more, such as Data::Serializer.

Your question has a number of implications, I'll try to address as many as I can.
First, read the perldoc for Data::Dumper. Setting $Data::Dumper::Terse = 1 may suffice for your needs. There are many options here in global variables, so be sure to localise them. But this changes the producer, not the consumer, of the data. I don't know how much control you have over that. Your question implies you're working on the consumer, but makes no mention of any control over the producer. Maybe the data already exists, and you have to use it as is.
The next implication is that you're tied to Data::Dumper. Again, the data may already exist, so too bad, use it. If this is not the case, I would recommend switching to another storable format. A fairly common one nowadays is JSON. While JSON isn't part of core perl, it's pretty trivial to install. It also makes this much easier. One advantage is that the data is useful in other languages, too. Another is that you avoid eval STRING which, should the data be compromised, could easily compromise your consumer.
The next item is just how to solve it as is. If the data exists, for example. A simple solution is to just add the my as you did. This works fine. Another one is to strip the $VAR1: (my $dumped = $db_row->{categories}) =~ s/^\s*\$\w+\s*=\s*//;. Another one is to put the "no warnings" right into the eval: eval ("no warnings; no strict; " . $db_row->{categories});.
Personally, I go with JSON whenever possible.

Your code would work as it stood except that the eval fails because $VAR1 is undeclared in the scope of the eval and use strict 'vars' is in effect.
Get around this by disabling strictures within as tight a block as possible. A do block does the trick, like this
my $wiki_categories = do {
no strict 'vars';
eval $db_row->{categories};
};

Concatenating strings in Perl with "join"

See Update Below
I am going through a whole bunch of Perl scripts that someone at my company wrote. He used join to concatenate strings. For example, he does this (taken hot out of a real Perl script):
$fullpath=join "", $Upload_Loc, "/", "$filename";
Instead of this:
$fullpath = "$Upload_Loc" . "/" . "$filename";
Or even just this:
$fullpath = "$Upload_Loc/$filename";
He's no longer here, but the people who are here tell me he concatenated strings in this way because it was somehow better. (They're not too clear why).
So, why would someone use join in this matter over using the . concatenate operator, or just typing the strings together as in the third example? Is there a valid reason for this style of coding?
I'm trying to clean up lot of the mess here, and my first thought would be to end this practice. It makes the code harder to read, and I'm sure doing a join is not a very efficient way to concatenate strings. However, although I've been writing scripts in Perl since version 3.x, I don't consider myself a guru because I've never had a chance to hang around with people who were better than Perl than I am and could teach me Perl's deep inner secrets. I just want to make sure that my instinct is correct here before I make a fool of myself.
I've got better ways of doing that around here.
Update
People are getting confused. He isn't just for concatenating paths. Here's another example:
$hotfix=join "", "$app", "_", "$mod", "_", "$bld", "_", "$hf", ".zip";
Where as I would do something like this:
$hotfix = $app . "_" $mod . "_" . $bld . "_" . "$hf.zip";
Or, more likely
$hotfix = "${app}_${mod}_${bld}_${hf}.zip";
Or maybe in this case, I might actually use join because the underscore causes problems:
$hotfix = join("_", $app, $mod, $bld, $hf) . ".zip";
My question is still: Is he doing something that real Perl hackers know, and a newbie like me who's been doing this for only 15 years don't know about? Do people look at me concatenating strings using . or just putting them in quotes and say "Ha! What a noob! I bet he owns a Macintosh too!"
Or, does the previous guy just has a unique style of programming much like my son's unique style of driving includes running head on into trees?

I've done my fair share of commercial Perl development for "a well known online retailer", and I've never seen join used like that. Your third example would be my preferred alternative, as it's simple, clean and readable.
Like others here, I don't see any genuine value in using join as a performance enhancer. It might well perform marginally better than string interpolation but I can't imagine a real-world situation where the optimisation could be justified yet the code still written in a scripting language.
As this question demonstrates, esoteric programming idioms (in any language) just lead to a lot of misunderstanding. If you're lucky, the misunderstanding is benign. The developers I enjoy working alongside are the ones who code for readability and consistency and leave the Perl Golf for the weekends. :)
In short: yes, I think his unique style is akin to your son's unique style of driving. :)

I would consider
$fullpath = join "/", $Upload_Loc, $filename;
clearer than the alternatives. However, File::Spec has been in the core for a long time, so
use File::Spec::Functions qw( catfile );
# ...
$fullpath = catfile $Upload_Loc, $filename;
is much better. And, better yet, there is Path::Class:
use Path::Class;
my $fullpath = file($Upload_Loc, $filename);
Speed is usually not a factor I consider in concatenating file names and paths.
The example you give in your update:
$hotfix=join "", "$app", "_", "$mod", "_", "$bld", "_", "$hf", ".zip";
demonstrates why the guy is clueless. First, there is no need to interpolate those individual variables. Second, that is better written as
$hotfix = join '_', $app, $mod, $bld, "$hf.zip";
or, alternatively, as
$hotfix = sprintf '%s_%s_%s_%s.zip', $app, $mod, $bld, $hf;
with reducing unnecessary punctuation being my ultimate goal.

In general, unless the lists of items to be joined are huge, you will not see much of a performance difference changing them over to concatenations. The main concern is readability and maintainability, and in those cases, if the string interpolation form is clearer, you can certainly use that.
I would guess that this is just a personal coding preference of the original programmer.
In general, I use join when the length of the list is large/unknown, or if I am joining with something other than the empty string (or a single space for array interpolation). Otherwise, using . or simple string interpolation is usually shorter and easier to read.

Perl compiles double-quoted strings into things with join and . catenation in them:
$ perl -MO=Deparse,-q -e '$fullpath = "$Upload_Loc/$filename"'
$fullpath = $Upload_Loc . '/' . $filename;
-e syntax OK
$ perl -MO=Deparse,-q -le 'print "Got #ARGV"'
BEGIN { $/ = "\n"; $\ = "\n"; }
print 'Got ' . join($", #ARGV);
-e syntax OK
which may inspire you to things like this:
$rx = do { local $" = "|"; qr{^(?:#args)$} };
as in:
$ perl -le 'print $rx = do { local $" = "\t|\n\t"; qr{ ^ (?xis: #ARGV ) $ }mx }' good stuff goes here
(?^mx: ^ (?xis: good |
stuff |
goes |
here ) $ )
Nifty, eh?

Interpolation is a little slower than joining a list. That said I've never known anyone to take it to this extreme.
You could use the Benchmark module to determine how much difference there is.
Also, you could ask this question over on http://perlmonks.org/. There are real gurus there who can probably give you the inner secrets much better than I can.

All of those approaches are fine.
Join can sometimes be more powerful than . concatentate, particularly when some of the things you are joining are arrays:
join "/", "~", #document_path_elements, $myDocument;

While recognizing that in all the examples I see here, there are no significant performance differences, a series of concatenation, whether with . or with double-quotish interpolation, is indeed going to be more memory-inefficient than a join, which precomputes the needed string buffer for the result instead of expanding it several times (potentially even needing to move the partial resutl to a new location each time).
I have a problem with the criticism I see leveled here; there are many right ways to speak perl, and this is certainly one of them.
Inconsistent indentation, on the other hand...

Why can't I set $LIST_SEPARATOR in Perl?

I want to set the LIST_SEPARATOR in perl, but all I get is this warning:
Name "main::LIST_SEPARATOR" used only once: possible typo at ldapflip.pl line 7.
Here is my program:
#!/usr/bin/perl -w
#vals;
push #vals, "a";
push #vals, "b";
$LIST_SEPARATOR='|';
print "#vals\n";
I am sure I am missing something obvious, but I don't see it.
Thanks

Only the mnemonic is available
$" = '|';
unless you
use English;
first.
As described in perlvar. Read the docs, please.
The following names have special meaning to Perl. Most punctuation names have reasonable mnemonics, or analogs in the shells. Nevertheless, if you wish to use long variable names, you need only say
use English;
at the top of your program. This aliases all the short names to the long names in the current package. Some even have medium names, generally borrowed from awk. In general, it's best to use the
use English '-no_match_vars';
invocation if you don't need $PREMATCH, $MATCH, or $POSTMATCH, as it avoids a certain performance hit with the use of regular expressions. See English.

perlvar is your friend:
• $LIST_SEPARATOR
• $"
This is like $, except that it applies to array and slice values interpolated into a double-quoted string (or similar interpreted string). Default is a space. (Mnemonic: obvious, I think.)
$LIST_SEPARATOR is only avaliable if you use English; If you don't want to use English; in all your programs, use $" instead. Same variable, just with a more terse name.

Slightly off-topic (the question is already well answered), but I don't get the attraction of English.
Cons:
A lot more typing
Names not more obvious (ie, I still have to look things up)
Pros:
?
I can see the benefit for other readers - especially people who don't know Perl very well at all. But in that case, if it's a question of making code more readable later, I would rather this:
{
local $" = '|'; # Set interpolated list separator to '|'
# fun stuff here...
}

you SHOULD use the strict pragma:
use strict;
you might want to use the diagnostics pragma to get additional hits about the warnings (that you already have enabled with the -w flag):
use diagnostics;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse