Is "map" a loop? - perl

While answering this question, I came to realize that I was not sure whether Perl's map can be considered a loop or not?
On one hand, it quacks/walks like a loop (does O(n) work, can be easily re-written by an equivalent loop, and sort of fits the common definition = "a sequence of instructions that is continually repeated").
On the other hand, map is not usually listed among Perl's control structures, of which loops are a subset of. E.g. http://en.wikipedia.org/wiki/Perl_control_structures#Loops
So, what I'm looking for is a formal reason to be convinced of one side vs. the other. So far, the former (it is a loop) sounds a lot more convincing to me, but I'm bothered by the fact that I never saw "map" mentioned in a list of Perl loops.

map is a higher level concept than loops, borrowed from functional programming. It doesn't say "call this function on each of these items, one by one, from beginning to end," it says "call this function on all of these items." It might be implemented as a loop, but that's not the point -- it also might be implemented asynchronously -- it would still be map.
Additionally, it's not really a control structure in itself -- what if every perl function that used a loop in its implementation were listed under "loops?" Just because something is implemented using a loop, doesn't mean it should be considered its own type of loop.

No, it is not a loop, from my perspective.
Characteristic of (perl) loops is that they can be broken out of (last) or resumed (next, redo). map cannot:
map { last } qw(stack overflow); # ERROR! Can't "last" outside a loop block
The error message suggests that perl itself doesn't consider the evaluated block a loop block.

From an academic standpoint, a case can be made for both depending on how map is defined. If it always iterates in order, then a foreach loop could be emulated by map making the two equivalent. Some other definitions of map may allow out of order execution of the list for performance (dividing the work amongst threads or even separate computers). The same could be done with the foreach construct.
But as far as Perl 5 is concerned, map is always executed in order, making it equivalent to a loop. The internal structure of the expression map $_*2, 1, 2, 3 results in the following execution order opcodes which show that map is built internally as a while-like control structure:
OP enter
COP nextstate
OP pushmark
SVOP const IV 1
SVOP const IV 2
SVOP const IV 3
LISTOP mapstart
LOGOP (0x2f96150) mapwhile <-- while still has items, shift one off into $_
PADOP gvsv GV *_
SVOP const IV 2 loop body
BINOP multiply
goto LOGOP (0x2f96150) <-- jump back to the top of the loop
LISTOP leave

The map function is not a loop in Perl. This can be clearly seen by the failure of next, redo, and last inside a map:
perl -le '#a = map { next if $_ %2; } 1 .. 5; print for #a'
Can't "next" outside a loop block at -e line 1.
To achieve the desired affect in a map, you must return an empty list:
perl -le '#a = map { $_ %2 ? () : $_ } 1 .. 5; print for #a'
2
4
I think transformation is better name for constructs like map. It transforms one list into another. A similar function to map is List::Util::reduce, but instead of transforming a list into another list, it transforms a list into a scalar value. By using the word transformation, we can talk about the common aspects of these two higher order functions.
That said, it works by visiting every member of the list. This means it behaves much like a loop, and depending on what your definition of "a loop" is it might qualify. Note, my definition means that there is no loop in this code either:
#!/usr/bin/perl
use strict;
use warnings;
my $i = 0;
FOO:
print "hello world!\n";
goto FOO unless ++$i == 5;
Perl actually does define the word loop in its documentation:
loop
A construct that performs something repeatedly, like a roller
coaster.
By this definition, map is a loop because it preforms its block repeatedly; however, it also defines "loop control statement" and "loop label":
loop control statement
Any statement within the body of a loop that can make a loop
prematurely stop looping or skip an "iteration". Generally you
shouldn't try this on roller coasters.
loop label
A kind of key or name attached to a loop (or roller coaster) so
that loop control statements can talk about which loop they want to
control.
I believe it is imprecise to call map a loop because next and its kin are defined as loop control statements and they cannot control map.
This is all just playing with words though. Describing map as like-a-loop is a perfectly valid way of introducing someone to it. Even the documentation for map uses a foreach loop as part of its example:
%hash = map { get_a_key_for($_) => $_ } #array;
is just a funny way to write
%hash = ();
foreach (#array) {
$hash{get_a_key_for($_)} = $_;
}
It all depends on the context though. It is useful to describe multiplication to someone as repeated addition when you are trying to get him or her to understand the concept, but you wouldn't want him or her to continue to think of it that way. You would want him or her to learn the rules of multiplication instead of always translating back to the rules of addition.

Your question turns on the issue of classification. At least under one interpretation, asking whether map is a loop is like asking whether map is a subset of "Loop". Framed in this way, I think the answer is no. Although map and Loop have many things in common, there are important differences.
Loop controls: Chas. Owens makes a strong case that Perl loops are subject to loop controls like next and last, while map is not.
Return values: the purpose of map is its return value; with loops, not so much.
We encounter relationships like this all the time in the real world -- things that have much in common with each other, but with neither being a perfect subset of the other.
-----------------------------------------
|Things that iterate? |
| |
| ------------------ |
| |map() | |
| | | |
| | --------|---------- |
| | | | | |
| | | | | |
| ------------------ | |
| | | |
| | Loop| |
| ------------------ |
| |
-----------------------------------------

map is a higher-order function. The same applies to grep. Book Higher-Order Perl explains the idea in full details.
It's sad to see that discussion moved towards implementation details, not the concept.

FM's and Dave Sherohman's answers are quite good, but let me add an additional way of looking at map.
map is a function which is guaranteed to look at every element of a structure exactly once. And it is not a control structure, as it (itself) is a pure function. In other words, the invariants that map preserves are very strong, much stronger than 'a loop'. So if you can use a map, that's great, because you then get all these invariants 'for free', while if you're using a (more general!) control structure, you'll have to establish all these invariants yourself if you want to be sure your code is right.
And that's really the beauty of a lot of these higher-order functions: you get many more invariants for free, so that you as a programmer can spend your valuable thinking time maintaining application-dependent invariants instead of worrying about low-level implementation-dependent issues.

map itself is generally implemented using a loop of some sort (to loop over iterators, typically), but since it is a higher-level structure, it's often not included in lists of lower-level control structures.

Here is a definition of map as a recurrence:
sub _map (&#) {
my $f = shift;
return unless #_;
return $f->( local $_ = shift #_ ),
_map( $f, #_ );
}
my #squares = _map { $_ ** 2 } 1..100;

"Loop" is more of a CS term rather than a language-specific one. You can be reasonably confident in calling something a loop if it exhibits these characteristics:
iterates over elements
does the same thing every time
is O(n)
map fits these pretty closely, but it's not a loop because it's a higher-level abstraction. It's okay to say it has the properties of a loop, even if it itself isn't a loop in the strictest, lowest-level sense.

I think map fits the definition of a Functor.

It all depends on how you look at it...
On the one hand, Perl's map can be considered a loop, if only because that's how it's implemented in (current versions of) Perl.
On the other, though, I view it as a functional map and choose to use it accordingly which, among other things, includes only making the assumption that all elements of the list will be visited, but not making any assumptions about the order in which they will be visited. Aside from the degree of functional purity this brings and giving map a reason to exist and be used instead of for, this also leaves me in good shape if some future version of Perl provides a parallelizable implementation of map. (Not that I have any expectation of that ever happening...)

I think of map as more akin to an operator, like multiplication. You could even think of integer multiplication as a loop of additions :). It's not a loop of course, even if it were stupidly implemented that way. I see map similarly.

A map in Perl is a higher order function that applies a given function to all elements of an array and returns the modified array.
Whether this is implemented using an iterative loop or by recursion or any other way is not relevant and unspecified.
So a map is not a loop, though it may be implemented using a loop.

Map only looks like a loop if you ignore the lvalue. You can't do this with a for loop:
print join ' ', map { $_ * $_ } (1 .. 5)
1 4 9 16 25

Related

Does Perl's Glob have a limitation?

I am running the following expecting return strings of 5 characters:
while (glob '{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}'x5) {
print "$_\n";
}
but it returns only 4 characters:
anbc
anbd
anbe
anbf
anbg
...
However, when I reduce the number of characters in the list:
while (glob '{a,b,c,d,e,f,g,h,i,j,k,l,m}'x5) {
print "$_\n";
}
it returns correctly:
aamid
aamie
aamif
aamig
aamih
...
Can someone please tell me what I am missing here, is there a limit of some sort? or is there a way around this?
If it makes any difference, It returns the same result in both perl 5.26 and perl 5.28
The glob first creates all possible file name expansions, so it will first generate the complete list from the shell-style glob/pattern it is given. Only then will it iterate over it, if used in scalar context. That's why it's so hard (impossible?) to escape the iterator without exhausting it; see this post.
In your first example that's 265 strings (11_881_376), each five chars long. So a list of ~12 million strings, with (naive) total in excess of 56Mb ... plus the overhead for a scalar, which I think at minimum is 12 bytes or such. So at the order of a 100Mb's, at the very least, right there in one list.†
I am not aware of any formal limits on lengths of things in Perl (other than in regex) but glob does all that internally and there must be undocumented limits -- perhaps some buffers are overrun somewhere, internally? It is a bit excessive.
As for a way around this -- generate that list of 5-char strings iteratively, instead of letting glob roll its magic behind the scenes. Then it absolutely should not have a problem.
However, I find the whole thing a bit big for comfort, even in that case. I'd really recommend to write an algorithm that generates and provides one list element at a time (an "iterator"), and work with that.
There are good libraries that can do that (and a lot more), some of which are Algorithm::Loops recommended in a previous post on this matter (and in a comment), Algorithm::Combinatorics (same comment), Set::CrossProduct from another answer here ...
Also note that, while this is a clever use of glob, the library is meant to work with files. Apart from misusing it in principle, I think that it will check each of (the ~ 12 million) names for a valid entry! (See this page.) That's a lot of unneeded disk work. (And if you were to use "globs" like * or ? on some systems it returns a list with only strings that actually have files, so you'd quietly get different results.)
† I'm getting 56 bytes for a size of a 5-char scalar. While that is for a declared variable, which may take a little more than an anonymous scalar, in the test program with length-4 strings the actual total size is indeed a good order of magnitude larger than the naively computed one. So the real thing may well be on the order of 1Gb, in one operation.
Update A simple test program that generates that list of 5-char long strings (using the same glob approach) ran for 15-ish minutes on a server-class machine and took 725 Mb of memory.
It did produce the right number of actual 5-char long strings, seemingly correct, on this server.
Everything has some limitation.
Here's a pure Perl module that can do it for you iteratively. It doesn't generate the entire list at once and you start to get results immediately:
use v5.10;
use Set::CrossProduct;
my $set = Set::CrossProduct->new( [ ([ 'a'..'z' ]) x 5 ] );
while( my $item = $set->get ) {
say join '', #$item
}

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

Find combinations of numbers that sum to some desired number

I need an algorithm that identifies all possible combinations of a set of numbers that sum to some other number.
For example, given the set {2,3,4,7}, I need to know all possible subsets that sum to x. If x == 12, the answer is {2,3,7}; if x ==7 the answer is {{3,4},{7}} (ie, two possible answers); and if x==8 there is no answer. Note that, as these example imply, numbers in the set cannot be reused.
This question was asked on this site a couple years ago but the answer is in C# and I need to do it in Perl and don't know enough to translate the answer.
I know that this problem is hard (see other post for discussion), but I just need a brute-force solution because I am dealing with fairly small sets.
sub Solve
{
my ($goal, $elements) = #_;
# For extra speed, you can remove this next line
# if #$elements is guaranteed to be already sorted:
$elements = [ sort { $a <=> $b } #$elements ];
my (#results, $RecursiveSolve, $nextValue);
$RecursiveSolve = sub {
my ($currentGoal, $included, $index) = #_;
for ( ; $index < #$elements; ++$index) {
$nextValue = $elements->[$index];
# Since elements are sorted, there's no point in trying a
# non-final element unless it's less than goal/2:
if ($currentGoal > 2 * $nextValue) {
$RecursiveSolve->($currentGoal - $nextValue,
[ #$included, $nextValue ],
$index + 1);
} else {
push #results, [ #$included, $nextValue ]
if $currentGoal == $nextValue;
return if $nextValue >= $currentGoal;
}
} # end for
}; # end $RecursiveSolve
$RecursiveSolve->($goal, [], 0);
undef $RecursiveSolve; # Avoid memory leak from circular reference
return #results;
} # end Solve
my #results = Solve(7, [2,3,4,7]);
print "#$_\n" for #results;
This started as a fairly direct translation of the C# version from the question you linked, but I simplified it a bit (and now a bit more, and also removed some unnecessary variable allocations, added some optimizations based on the list of elements being sorted, and rearranged the conditions to be slightly more efficient).
I've also now added another significant optimization. When considering whether to try using an element that doesn't complete the sum, there's no point if the element is greater than or equal to half the current goal. (The next number we add will be even bigger.) Depending on the set you're trying, this can short-circuit quite a bit more. (You could also try adding the next element instead of multiplying by 2, but then you have to worry about running off the end of the list.)
The rough algorithm is as follows:
have a "solve" function that takes in a list of numbers already included and a list of those not yet included.
This function will loop through all the numbers not yet included.
If adding that number in hits the goal then record that set of numbers and move on,
if it is less than the target recursively call the function with the included/exluded lists modified with the number you are looking at.
else just go to the next step in the loop (since if you are over there is no point trying to add more numbers unless you allow negative ones)
You call this function initially with your included list empty and your yet to be included list with your full list of numbers.
There are optimisations you can do with this such as passing the sum around rather than recalculating each time. Also if you sort your list initially you can do optimisations based on the fact that if adding number k in the list makes you go over target then adding k+1 will also send you over target.
Hopefully that will give you a good enough start. My perl is unfortuantely quite rusty.
Pretty much though this is a brute force algorithm with a few shortcuts in it so its never going to be that efficient.
You can make use of the Data::PowerSet module which generates all subsets of a list of elements:
Use Algorithm::Combinatorics. That way, you can decide ahead of time what size subsets you want to consider and keep memory use to a minimum. Apply some heuristics to return early.
#!/usr/bin/perl
use strict; use warnings;
use List::Util qw( sum );
use Algorithm::Combinatorics qw( combinations );
my #x = (1 .. 10);
my $target_sum = 12;
{
use integer;
for my $n ( 1 .. #x ) {
my $iter = combinations(\#x, $n);
while ( my $set = $iter->next ) {
print "#$set\n" if $target_sum == sum #$set;
}
}
}
The numbers do blow up fairly rapidly: It would take thousands of days to go through all subsets of a 40 element set. So, you should decide on the interesting sizes of subsets.
Is this a 'do my homework for me' question?
To do this deterministically would need an algorithm of order N! (i.e. (N-0) * (N-1) * (N-2)...) which is going to be very slow with large sets of inputs. But the algorithm is very simple: work out each possible sequence of the inputs in the set and try adding up the inputs in the sequence. If at any point the sum matches, you've got one of the answers, save the result and move on to the next sequence. If at any point the sum is greater than the target, abandon the current sequence and move on to the next.
You could optimize this a little by deleting any of the inputs greater than the target. Another approach for optimization would be to to take the first input I in the sequence and create a new sequence S1, deduct I from the target T to get a new target T1, then check if T exists in S1, if it does then you've got a match, otherwise repeat the process with S1 and T1. The order is still N! though.
If you needed to do this with a very large set of numbers then I'd suggest reading up on genetic algorithms.
C.
Someone posted a similar question a while ago and another person showed a neat shell trick to answer it. Here is a shell technique, but I don't think it is as neat a solution as the one I saw before (so I'm not taking credit for this approach). It's cute because it takes advantage of shell expansion:
for i in 0{,+2}{,+3}{,+4}{,+7}; do
y=$(( $i )); # evaluate expression
if [ $y -eq 7 ]; then
echo $i = $y;
fi;
done
Outputs:
0+7 = 7
0+3+4 = 7

Limiting the amount of information printed by Perl debugger

One of my pet peeves with debugging Perl code (in command line debbugger, perl -d) is the fact that mistakenly printing (via x command) the contents of a huge datastructure is guaranteed to freeze up your terminal for forever and a half while 100s of pages of data are printed. Epecially if that happens across slowish network.
As such, I'd like to be able to limit the amount of data that x prints.
I see two approaches - I'd be willing to try either if someone knows how to do.
Limit the amount of data any single command in debugger prints.
Better yet, somehow replace the built-in x command with a custom Perl method (which would calculate the "size" of the data structure, and refuse to print its contents without confirmation).
I'm specifically asking "how to replace x with custom code" - building a Good Enough "is the data structure too big" Perl method is something I can likely do on my own without too much effort although I see enough pitfalls preventing the "perfect" one from being a fairly frustrating endeavour. Heck, merely doing Data::Dumper->Dump and taking the length of the string might do the trick :)
Please note that I'm perfectly well aware of how to manually avoid the issue by recursively examining layers of datastructure (e.g. print the ref, print the count of keys/array elements, etc...)... the whole point is I want to be able to avoid thoughtlessly typing x $huge_pile_of_data without thinking - or stumbling on a bug populating said huge pile of data into what should be a scalar.
The x command takes an optional argument for the maximum depth to display. That's not quite the same as limiting the amount of data to N pages, but it's definitely useful to prevent overload.
DB<1> %h = (a => { b => { c => 1 } } )
DB<2> x %h
0 'a'
1 HASH(0x1d5ff44)
'b' => HASH(0x1d61424)
'c' => 1
DB<3> x 2 %h
0 'a'
1 HASH(0x1d5ff44)
'b' => HASH(0x1d61424)
You can specify the default depth to print via the o command, e.g.
DB<1>o dumpDepth=1
Add that to your .perldb file to apply it to all debugger sessions.
Otherwise, it looks like the x command invokes DB::dumpit() which is just a wrapper for dumpval.pl (or, more specifically, the main::dumpValue() sub it defines). You could modify/replace that script as you see fit. I'm not sure how you'd make it interactive, though.
The | command in the debugger pipes another command's output to your pager, e.g.
DB<1> |x %huge_datastructure

When is Perl's scalar comma operator useful?

Is there any reason to use a scalar comma operator anywhere other than in a for loop?
Since the Perl scalar comma is a "port" of the C comma operator, these comments are probably apropos:
Once in a while, you find yourself in
a situation in which C expects a
single expression, but you have two
things you want to say. The most
common (and in fact the only common)
example is in a for loop, specifically
the first and third controlling
expressions. What if (for example) you
want to have a loop in which i counts
up from 0 to 10 at the same time that
j is counting down from 10 to 0?
So, your instinct that it's mainly useful in for loops is a good one, I think.
I occasionally use it in the conditional (sometimes erroneously called "the ternary") operator, if the code is easier to read than breaking it out into a real if/else:
my $blah = condition() ? do_this(), do_that() : do_the_other_thing();
It could also be used in some expression where the last result is important, such as in a grep expression, but in this case it's just the same as if a semicolon was used:
my #results = grep { setup(), condition() } #list;