When modifying $_ can be wrong? - perl

When the following code could have side effects?
#some = map { s/xxx/y/; $_ } #some;
The perlcritic explains it as dangerous, because for example:
#other = map { s/xxx/y/; $_ } #some;
and the members of the #some got also modified. Understand. I have the BPB book, and it shows the above with the example
#pm_files_without_pl_files
= grep { s/.pm\z/.pl/xms && !-e } #pm_files;
and also I read the chapter "List Processing Side Effects" / "Never modify $_ in a list function." and its followers. Also i know the /r.
To be clear (as much is possible with my terrible english):
In the 1st example the main point is modifying the original #some.
The question is about:
could the 1st example #some = map { s/xxx/y/; $_ } #some; causing some unwanted side-effects? If yes, when?
or it is just the "not recommented" way (but harmless otherwise)?
Looking for an answer what goes a bit deeper as some "perl beginner's book" - therefore still doesn't accepted any current answer. ;)

One of the mottos of perl has always been TIMTOWTDI: there is more than one way to do it. If two ways have the same end result, they're equally correct. That doesn't mean there aren't reasons to prefer one way over the other.
In the first case, it would be more obvious (to me, YMMV) to do something like
s/xxx/y/ for #some;
This is mainly because it's communicating intend better. for suggests it's all about the side effect, whereas map suggests it's about the return value. While functionally identical, this should be much easier to understand for your fellow programmer (and probably for yourself in 6 months from now).
There's more than one way, but some are better than others.

Code like your example:
#some = map { s/xxx/y/; $_ } #some;
should be avoided because it's redundant and confusing. It looks like the assignment on the left should be doing something, even though it's actually a no-op. Indeed, just writing:
map { s/xxx/y/; $_ } #some;
would have the exact same effect, as would:
map { s/xxx/y/ } #some;
This version at least has the virtue of making it (reasonably) clear that the return value of map is being ignored, and that the actual purpose of the statement is to modify #some in place.
But of course, as Leon has already pointed out, by far the clearest and most idiomatic way of writing this would be:
s/xxx/y/ for #some;

#some = map { s/xxx/y/; $_ } #some;
will work fine, but it's very poor code because it's not obvious that you're effectively doing
map { s/xxx/y/ } #some; #some = #some;
This already shows you could simply have done
map { s/xxx/y/ } #some;
But that's a misleading and inefficient version of
s/xxx/y/ for #some;
It's all about readability and maintainability.
Note that you can do
use List::MoreUtils qw( apply );
#some = apply { s/xxx/y/ } #some;
And in Perl 5.14+,
#some = map { s/xxx/y/r } #some;

Related

Perl sub returns a subroutine

I haven't used Perl for around 20 years, and this is confusing me. I've g******d for it, but I obviously haven't used a suitable search string because I haven't found anything relating to this...
Why would I want to do the following? I understand what it's doing, but the "why" escapes me. Why not just return 0 or 1 to begin with?
I'm working on some code where a sub uses "return sub"; here's a very truncated example e.g.
sub test1 {
$a = shift #_;
if ($a eq "cat") {
return sub {
print("cat test OK\n");
return 0;
}
}
# default if "cat" wasn't the argument
return sub {
print("test for cat did not work\n");
return 1;
}
}
$c = test1("cat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
$c = test1("bat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
In your code there is no reason to return a sub. However, with a little tweak
sub test1 {
my $animal = shift #_;
if ($animal eq "cat" || $animal eq "dog") {
return sub {
print("$animal test OK\n");
return 0;
};
}
# default if "cat" or "dog" wasn't the argument
return sub {
print("test for cat or dog did not work\n");
return 1;
};
}
We now have a closure around $animal this saves memory as the test for cat and dog share the same code. Note that this only works with my variables. Also note that $a and $b are slightly special to Perl, they are used in the block of code that you can pass to the sort function and bypass some of the checks on visibility so it's best to avoid them for anything except sort.
You probably want to search "perl closures".
There are many reasons that you'd want to return a code reference, but it's not something I can shortly answer in a StackOverflow question. Mark Jason Dominus's Higher Order Perl is a good way to expand your mind, and we cover a little of that in Intermediate Perl.
I wrote File::Find::Closures as a way to demonstrate this is class. Each subroutine in that module returns two code references—one for the callback to File::Find and the other as a way to access the results. The two share a common variable which nothing else can access.
Notice in your case, you aren't merely calling a subroutine to "get a zero". It's doing other things. Even in your simple example there's some output. Some behavior is then deferred until you actually use the result for something.
Having said that, we have no chance of understanding why the programmer who wrote your particular code did it. One plausible guess was that the system was set up for more complex situations and you're looking at a trivial example that fits into that. Another plausible guess was that the programmer just learned that feature and fell in love with it then used it everywhere for a year. There's always a reason, but that doesn't mean there's always a good reason.

how to find multiple regex patterns in a single way using Perl

Question Updated
I have list of (few more) regex patterns like: (Note: Sequence is very Important)
([a-z]+)(\d+)
\}([a-z]+)
([a-z]+)(\+|\-)
([0-9])\](\+|\-)
...
...
my input file like :
\ce{CO2}
\ce{2CO}
\ce{H2O}
\ce{Sb2O3}
...
...
In my code I am finding the each and every regex patterns like
if($string=~m/([a-z]+)(\d+)/g) { my statements ... }
if($string=~m/\}([a-z]+)/g) { my statements ... }
if($string=~m/([a-z]+)(\+|\-)/g) { my statements ... }
if($string=~m/([0-9])\](\+|\-)/g) { my statements ... }
Instead of doing the above code Is there any other way to simplify the code?
Could you someone please share your thoughts for my improvement for better coding.
Disclaimer: Your question is very hard to read, so this is pretty much guesswork. I am not sure I understand what you want to do.
When you are processing data in a dynamic way, a typical approach is to use a dispatch table. We can do something similar here. Often a hash or hash reference is used for that, but since we want a specific order, I will be using an array instead.
my #dispatch = (
{
pattern => qr/f(o)(o)/,
callback => sub {
my ($one, $two) = #_;
print "Found $one and $two\n";
},
},
{
pattern => qr/(bar)/,
callback => sub {
my $capture = shift;
print "Saw $capture";
},
},
);
This basically is a list of search patterns and associated instructions. Each pattern has a callback, which is a code reference. I decided it would make sense to pass in the capture variables, because your patterns have capture groups.
Now in order to call them, we iterate over the dispatch array, match the pattern and then call the associated callback, passing in all the captures.
my $text = "Foo bar foo bar baz.";
foreach my $search (#dispatch) {
if ($text =~ $search->{pattern}) {
$search->{callback}->(#{^CAPTURE}); # this requires Perl 5.26
}
}
Please note that I am using #{^CAPTURE}, which was added to Perl in version 5.25.7, so you would require at least the stable Perl 5.26 release to use it. (On an older Perl, my #capture = $t =~ $search->{pattern} and $search->{callback}->(#capture) will behave similarly).
This is way more elegant than having a list of if () {} statement because it's very easy to extend. The dispatch table could be created on the fly, based on some input, or entirely read from disk.
When we run this code, it creates the following output
Found o and o
Saw bar
This is not very spectacular, but you should be able to adapt it to your patterns. On the other hand I don't know what you are actually trying to do. If you wanted to modify the string instead of only matching, you might need additional arguments for your callbacks.
If you want to learn more about dispatch tables, I suggest you read the second chapter of Mark Jason Dominus' excellent book Higher Order Perl, which is available for free as a PDF on his website.
Your question is hard to read, mainly because you have the /g at the end of your regex searches (which returns a list), however, you only check if it matches once.
I'm making the following assumptions
All matches are required
the code can be a single or double match
both groups captured in one line
i think you want
while ( $string =~ /(([a-z]+)(\d+)|\}([a-z]+)|([a-z]+)(\+|\-)|([0-9])\](\+|\-))/g )
{
#$1 has the whole match
#$2 has the first group if defined
#$3 has the second group if defined
}
However, I prefer the method below. this will capture in one line
while ($string =~ /([a-z]+\d+|\}[a-z]+|[a-z]+\+|\-|[0-9]\]\+|\-)/g )
{
# in here split the match if required
}
I recommend you use regex comments to make this clearer.
if you just want a single match, use
if(
$string=~m/([a-z]+)(\d+)/ ||
$string=~m/\}([a-z]+)/ ||
$string=~m/([a-z]+)(\+|\-)/ ||
$string=~m/([0-9])\](\+|\-)/
)
{
#some code
}

Is there something like `last` for `map`?

In Perl, is it possible to arbitrarily end a map execution, e.g. something equivalent to last in a loop?
It would be a bit like this:
map {
if (test == true) { last; } dosomething
} #myarray
Nope. You can't last, next, etc. out of a map, grep or sort blocks because they are not loops.
Having said that, the code snippet in the question is poor practice because map should never be called in void context. They are supposed to be used to transform one list into another.
Here's a better way to write it if you really must inline it (modified after ysth's comment):
$_ == 10 && last, print for 1..15; # prints '123456789'
No. Use an ordinal foreach loop and the last statement.
Since 5.8.1, map is context aware - in void context, no lists are constructed.
Anyway, map is generally used to get a list from another list, evaluating expr for each element of the original list.
You could use a do-block with a for statement modifier:
do {
last if test;
dosomething;
} for (#myarray);
Though using a foreach block would probably be clearer, and future maintainers of your code will thank you.
foreach (#myarray) {
last if test;
dosomething;
}
You can use a long jump (eval/die pair) to bail out of any nested construct that doesn't directly support it:
eval { map{ die if test; dosomething } #myarray };
But as Zaid said, using a for/foreach loop in this case is better because you are not using the return value of map.
You want a for loop:
foreach ( #myarray ) {
last if test;
...
}
It does the same thing. map is for transforming one list into other lists.
There are map-like constructs that do exactly what you want to do. Take a look at List::Util and List::MoreUtils (conveniently also packaged together as List::AllUtils):
use List::MoreUtils 'first';
# get first element with a {foo} key
my $match = map { $_->{foo} eq 'some string' } #elements;
If you don't want to extract an element(s) from the list, then use foreach, as per the previous answers.
Try goto LABEL. However I do not know how safe is that.

How can a Perl force its caller to return? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is it possible for a Perl subroutine to force its caller to return?
I want to write a subroutine which causes the caller to return under certain conditions. This is meant to be used as a shortcut for validating input to a function. What I have so far is:
sub needs($$) {
my ($condition, $message) = #_;
if (not $condition) {
print "$message\n";
# would like to return from the *parent* here
}
return $condition;
}
sub run_find {
my $arg = shift #_;
needs $arg, "arg required" or return;
needs exists $lang{$arg}, "No such language: $arg" or return;
# etc.
}
The advantage of returning from the caller in needs would then be to avoid having to write the repetitive or return inside run_find and similar functions.
I think you're focussing on the wrong thing here. I do this sort of thing with Data::Constraint, Brick, etc. and talk about this in Mastering Perl. With a little cleverness and thought about the structure of your program and the dynamic features that Perl has, you don't need such a regimented, procedural approach.
However, the first thing you need to figure out is what you really want to know in that calling subroutine. If you just want to know yes or no, it's pretty easy.
The problem with your needs is that you're thinking about calling it once for every condition, which forces you to use needs to control program flow. That's the wrong way to go. needs is only there to give you an answer. It's job is not to change program state. It becomes much less useful if you misuse it because some other calling subroutine might want to continue even if needs returns false. Call it once and let it return once. The calling subroutine uses the return value to decide what it should do.
The basic structure involves a table that you pass to needs. This is your validation profile.
sub run_find {
my $arg = shift #_;
return unless needs [
[ sub { $arg }, "arg required" ],
[ sub { exists $lang{$arg} }, "No such language: $arg" ],
];
}
...
}
You construct your table for whatever your requirements are. In needs you just process the table:
sub needs($$) {
my ($table) = #_;
foreach $test ( #$table ) {
my( $sub, $message ) = #$test;
unless( $sub->(...) ) {
print $message;
return
}
}
return 1;
}
Now, the really cool thing with this approach is that you don't have to know the table ahead of time. You can pull that from configuration or some other method. That also means that you can change the table dynamically. Now your code shrinks quite a bit:
sub run_find {
my $arg = shift #_;
return unless needs( $validators{run_find} );
...
}
You cna keep going with this. In Mastering Perl I show a couple of solutions that completely remove that from the code and moves it into a configuration file. That is, you can change the business rules without changing the code.
Remember, almost any time that you are typing the same sequence of characters, you're probably doing it wrong. :)
Sounds like you are re-inventing exception handling.
The needs function should not magically deduce its parent and interrupt the parent's control flow - that's bad manners. What if you add additional functions to the call chain, and you need to go back two or even three functions back? How can you determine this programmatically? Will the caller be expecting his or her function to return early? You should follow the principle of least surprise if you want to avoid bugs - and that means using exceptions to indicate that there is a problem, and having the caller decide how to deal with it:
use Carp;
use Try::Tiny;
sub run_find {
my $arg = shift;
defined $arg or croak "arg required";
exists $lang{$arg} or croak "no such language: $arg";
...
}
sub parent {
try { run_find('foo') }
catch { print $#; }
}
Any code inside of the try block is special: if something dies, the exception is caught and stored in $#. In this case, the catch block is executed, which prints the error to STDOUT and control flow continues as normal.
Disclaimer: exception handling in Perl is a pain. I recommend Try::Tiny, which protects against many common gotchas (and provides familiar try/catch semantics) and Exception::Class to quickly make exception objects so you can distinguish between Perl's errors and your own.
For validation of arguments, you might find it easier to use a CPAN module such as Params::Validate.
You may want to look at a similar recent question by kinopiko:
Is it possible for a Perl subroutine to force its caller to return?
The executive summary for that is: best solution is to use exceptions (die/eval, Try::Tiny, etc...). You van also use GOTO and possibly Continuation::Escape
It doesn't make sense to do things this way; ironically, ya doesn't needs needs.
Here's why.
run_find is poorly written. If your first condition is true, you'll never test the second one since you'll have returned already.
The warn and die functions will provide you printing and/or exiting behavior anyway.
Here's how I would write your run_find sub if you wanted to terminate execution if your argument fails (renamed it to well_defined):
sub well_defined {
my $arg = shift;
$arg or die "arg required";
exists $lang{$arg} or die "no such language: $arg";
return 1;
}
There should be a way to return 0 and warn at the same time, but I'll need to play around with it a little more.
run_find can also be written to return 0 and the appropriate warn message if conditions are not met, and return 1 if they are (renamed to well_defined).
sub well_defined {
my $arg = shift;
$arg or warn "arg required" and return 0;
exists $lang{$arg} or warn "no such language: $arg" and return 0;
return 1;
}
This enables Boolean-esque behavior, as demonstrated below:
perform_calculation $arg if well_defined $arg; # executes only if well-defined

How can I return a list of hashrefs from a map in Perl?

I have the following mostly ok code:
my $results = { data => [
map {
my $f = $_->TO_JSON;
$f->{display_field} = $_->display_field($q);
$f;
} $rs->all
]};
Only I'd rather it were more like the following:
my $results = { data => [
map {
%{$_->TO_JSON},
display_field => $_->display_field($q),
}, $rs->all
]};
But that gives a syntax error. How can I do what I want, or is my current version the best it gets?
update: sorry about the extra semicolon from before. It's late here. Not sure how I missed it. Thanks guys!
It only gives a syntax error because you Perl thinks you need to omit the comma after map { ... }, because it is parsing that map as being a block, not an expression. Putting + in front will fix that. Also, you can't have a semicolon in an anonymous hash:
my $results = { data => [
map +{
# ^----------------- plus sign added
%{$_->TO_JSON},
display_field => $_->display_field($q);
# ^---- should be comma or nothing
}, $rs->all
]};
The problem is that Perl doesn't look ahead far enough to figure out whether { means "start an anonymous hash reference" or "start a code block". It should (ideally) look to the corresponding } and see if there is or isn't a comma, and act accordingly, but it doesn't. It only looks a little bit ahead and tries to guess. And this time it's wrong, and you get a syntax error about a comma that shouldn't be there, except that it should so don't move it.
perldoc -f map will tell you all about this. Basically, it says that if you put +{, Perl will understand that this means "not a code block" and guess that it's a hash reference. This is probably the cause of your syntax error. As another suggestion, it might work to say map({ HASH STUFF }, $rs->all) - I bet money Perl won't guess it's a code reference here.
I couldn't get it to work, but not having $rs or a ->TO_JSON or a variable named $q I couldn't get any of this to work anyway. I hope this helps. If not, post a little more code. Don't worry, we don't bite.
Also, while we're at it, why not write it this way:
my $results;
$results->{data} = [ map MAGIC MAP STUFF, $rs->all ];
Might arguably be more readable, especially if you're adding a lot of stuff to $results all at once.
I'm not completely sure what kind of structure you're looking for. The map in your first example already returns a list of hashrefs (every version of $f).
If you just want syntax similar to your second example, then you were almost right; you need to get rid of the extraneous semicolon in your map block, and use a pair of curlies to make an anonymous hash ref.
Something like:
my $results = { data => [
map { { %{$_->TO_JSON},
display_field => $_->display_field($q)
}
} $rs->all
]};
I just always use map in the block form, and structure the code so it's easy to pick apart. Although you can put a + in front of the opening curly to use the expression form, does it really matter to you that much?
Aside from everything else going on, your first example looks fine. Move on and solve real problems. :)