Find combinations of numbers that sum to some desired number - perl

I need an algorithm that identifies all possible combinations of a set of numbers that sum to some other number.
For example, given the set {2,3,4,7}, I need to know all possible subsets that sum to x. If x == 12, the answer is {2,3,7}; if x ==7 the answer is {{3,4},{7}} (ie, two possible answers); and if x==8 there is no answer. Note that, as these example imply, numbers in the set cannot be reused.
This question was asked on this site a couple years ago but the answer is in C# and I need to do it in Perl and don't know enough to translate the answer.
I know that this problem is hard (see other post for discussion), but I just need a brute-force solution because I am dealing with fairly small sets.

sub Solve
{
my ($goal, $elements) = #_;
# For extra speed, you can remove this next line
# if #$elements is guaranteed to be already sorted:
$elements = [ sort { $a <=> $b } #$elements ];
my (#results, $RecursiveSolve, $nextValue);
$RecursiveSolve = sub {
my ($currentGoal, $included, $index) = #_;
for ( ; $index < #$elements; ++$index) {
$nextValue = $elements->[$index];
# Since elements are sorted, there's no point in trying a
# non-final element unless it's less than goal/2:
if ($currentGoal > 2 * $nextValue) {
$RecursiveSolve->($currentGoal - $nextValue,
[ #$included, $nextValue ],
$index + 1);
} else {
push #results, [ #$included, $nextValue ]
if $currentGoal == $nextValue;
return if $nextValue >= $currentGoal;
}
} # end for
}; # end $RecursiveSolve
$RecursiveSolve->($goal, [], 0);
undef $RecursiveSolve; # Avoid memory leak from circular reference
return #results;
} # end Solve
my #results = Solve(7, [2,3,4,7]);
print "#$_\n" for #results;
This started as a fairly direct translation of the C# version from the question you linked, but I simplified it a bit (and now a bit more, and also removed some unnecessary variable allocations, added some optimizations based on the list of elements being sorted, and rearranged the conditions to be slightly more efficient).
I've also now added another significant optimization. When considering whether to try using an element that doesn't complete the sum, there's no point if the element is greater than or equal to half the current goal. (The next number we add will be even bigger.) Depending on the set you're trying, this can short-circuit quite a bit more. (You could also try adding the next element instead of multiplying by 2, but then you have to worry about running off the end of the list.)

The rough algorithm is as follows:
have a "solve" function that takes in a list of numbers already included and a list of those not yet included.
This function will loop through all the numbers not yet included.
If adding that number in hits the goal then record that set of numbers and move on,
if it is less than the target recursively call the function with the included/exluded lists modified with the number you are looking at.
else just go to the next step in the loop (since if you are over there is no point trying to add more numbers unless you allow negative ones)
You call this function initially with your included list empty and your yet to be included list with your full list of numbers.
There are optimisations you can do with this such as passing the sum around rather than recalculating each time. Also if you sort your list initially you can do optimisations based on the fact that if adding number k in the list makes you go over target then adding k+1 will also send you over target.
Hopefully that will give you a good enough start. My perl is unfortuantely quite rusty.
Pretty much though this is a brute force algorithm with a few shortcuts in it so its never going to be that efficient.

You can make use of the Data::PowerSet module which generates all subsets of a list of elements:

Use Algorithm::Combinatorics. That way, you can decide ahead of time what size subsets you want to consider and keep memory use to a minimum. Apply some heuristics to return early.
#!/usr/bin/perl
use strict; use warnings;
use List::Util qw( sum );
use Algorithm::Combinatorics qw( combinations );
my #x = (1 .. 10);
my $target_sum = 12;
{
use integer;
for my $n ( 1 .. #x ) {
my $iter = combinations(\#x, $n);
while ( my $set = $iter->next ) {
print "#$set\n" if $target_sum == sum #$set;
}
}
}
The numbers do blow up fairly rapidly: It would take thousands of days to go through all subsets of a 40 element set. So, you should decide on the interesting sizes of subsets.

Is this a 'do my homework for me' question?
To do this deterministically would need an algorithm of order N! (i.e. (N-0) * (N-1) * (N-2)...) which is going to be very slow with large sets of inputs. But the algorithm is very simple: work out each possible sequence of the inputs in the set and try adding up the inputs in the sequence. If at any point the sum matches, you've got one of the answers, save the result and move on to the next sequence. If at any point the sum is greater than the target, abandon the current sequence and move on to the next.
You could optimize this a little by deleting any of the inputs greater than the target. Another approach for optimization would be to to take the first input I in the sequence and create a new sequence S1, deduct I from the target T to get a new target T1, then check if T exists in S1, if it does then you've got a match, otherwise repeat the process with S1 and T1. The order is still N! though.
If you needed to do this with a very large set of numbers then I'd suggest reading up on genetic algorithms.
C.

Someone posted a similar question a while ago and another person showed a neat shell trick to answer it. Here is a shell technique, but I don't think it is as neat a solution as the one I saw before (so I'm not taking credit for this approach). It's cute because it takes advantage of shell expansion:
for i in 0{,+2}{,+3}{,+4}{,+7}; do
y=$(( $i )); # evaluate expression
if [ $y -eq 7 ]; then
echo $i = $y;
fi;
done
Outputs:
0+7 = 7
0+3+4 = 7

Related

Does Perl's Glob have a limitation?

I am running the following expecting return strings of 5 characters:
while (glob '{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}'x5) {
print "$_\n";
}
but it returns only 4 characters:
anbc
anbd
anbe
anbf
anbg
...
However, when I reduce the number of characters in the list:
while (glob '{a,b,c,d,e,f,g,h,i,j,k,l,m}'x5) {
print "$_\n";
}
it returns correctly:
aamid
aamie
aamif
aamig
aamih
...
Can someone please tell me what I am missing here, is there a limit of some sort? or is there a way around this?
If it makes any difference, It returns the same result in both perl 5.26 and perl 5.28
The glob first creates all possible file name expansions, so it will first generate the complete list from the shell-style glob/pattern it is given. Only then will it iterate over it, if used in scalar context. That's why it's so hard (impossible?) to escape the iterator without exhausting it; see this post.
In your first example that's 265 strings (11_881_376), each five chars long. So a list of ~12 million strings, with (naive) total in excess of 56Mb ... plus the overhead for a scalar, which I think at minimum is 12 bytes or such. So at the order of a 100Mb's, at the very least, right there in one list.†
I am not aware of any formal limits on lengths of things in Perl (other than in regex) but glob does all that internally and there must be undocumented limits -- perhaps some buffers are overrun somewhere, internally? It is a bit excessive.
As for a way around this -- generate that list of 5-char strings iteratively, instead of letting glob roll its magic behind the scenes. Then it absolutely should not have a problem.
However, I find the whole thing a bit big for comfort, even in that case. I'd really recommend to write an algorithm that generates and provides one list element at a time (an "iterator"), and work with that.
There are good libraries that can do that (and a lot more), some of which are Algorithm::Loops recommended in a previous post on this matter (and in a comment), Algorithm::Combinatorics (same comment), Set::CrossProduct from another answer here ...
Also note that, while this is a clever use of glob, the library is meant to work with files. Apart from misusing it in principle, I think that it will check each of (the ~ 12 million) names for a valid entry! (See this page.) That's a lot of unneeded disk work. (And if you were to use "globs" like * or ? on some systems it returns a list with only strings that actually have files, so you'd quietly get different results.)
† I'm getting 56 bytes for a size of a 5-char scalar. While that is for a declared variable, which may take a little more than an anonymous scalar, in the test program with length-4 strings the actual total size is indeed a good order of magnitude larger than the naively computed one. So the real thing may well be on the order of 1Gb, in one operation.
Update A simple test program that generates that list of 5-char long strings (using the same glob approach) ran for 15-ish minutes on a server-class machine and took 725 Mb of memory.
It did produce the right number of actual 5-char long strings, seemingly correct, on this server.
Everything has some limitation.
Here's a pure Perl module that can do it for you iteratively. It doesn't generate the entire list at once and you start to get results immediately:
use v5.10;
use Set::CrossProduct;
my $set = Set::CrossProduct->new( [ ([ 'a'..'z' ]) x 5 ] );
while( my $item = $set->get ) {
say join '', #$item
}

Declaring a Perl array and assigning it values by an array-slice

I was trying to split a string and rearrange the results, all in a single statement:
my $date_str = '15/5/2015';
my #directly_assigned_date_array[2,1,0] = split ('/', $date_str);
This resulted in:
syntax error at Array_slice_test.pl line 16, near "#directly_assigned_date_array["
Why is that an error?
The following works well though:
my #date_array;
#date_array[2,1,0] = split ('/', $date_str);
#vol7ron offered a different way to do it:
my #rvalue_array = (split '/', $date_str)[2,1,0];
And it indeed does the job, but it looks unintuitive, to me at least.
As you are just reversing the splitted array you can accomplish the same using this single statement: #date_array = reverse(split('/',$date_str));
Others here know much more about Perl internals than myself, but I assume it cannot perform the operation because an array slice is referencing an element of an array, which does not yet exist. Because the array has not yet been declared, it wouldn't know what address to reference.
my #array = ( split '/', $date_str )[2,1,0];
This works because split returns values in list context. Lists and arrays are very similar in Perl. You could think of an array as a super list, with extra abilities. However you choose to think of it, you can perform a list slice just like an array slice.
In the above code, you're taking the list, then reordering it using the slice and then assigning that to array. It may feel different to think about at first, but it shouldn't be too hard. Generally, you want your data operations (modifications and ordering) to be performed on the rhs of the assignment and your lhs to be the receiving end.
Keep in mind that I've also dropped some parentheses and used Perl's smart order of operation interpreting to reduce the syntax. The same code might otherwise look like the following (same operations, just more fluff):
my #array = ( split( '/', $date_str ) )[2,1,0];
As #luminos mentioned, since you only have 3 elements you're manually reversing it, you could use a reverse function; again we can make use of Perl's magic order of operation and drop the parentheses here:
my #array = reverse split '/', $date_str;
But in this case it might be too magical, so depending on your coding practice guidelines, you may want to include a set of parentheses for the split or reverse, if it increases readability and comprehension.

perl - int() decrementing an integer

Before I get flamed, I want to say I do understand floating point numbers and things of the sort, but that doesn't seem to be my issue.
To simplify things, I'm trying to determine if a number has more than 2 decimal places. I'm doing this by multiplying the number by 100 (stored under variable "test1") and then truncating it with int() ($test2) and comparing it with an if.
$test1 = $number * 100;
$test2 = int($test1);
unless ($test1 == $test2) {
die ("test1:$test1, test2:$test2");
}
The initial $number comes from a whole series of other functions and should realistically be only two decimals, hence I'm trying to catch those that aren't (as a few entries seem to have very many decimals).
However, I just got:
test1:15, test2:14
from my die().
Can someone explain how that would happen? How can int(15) be 14?
From perldoc:
machine representations of floating-point numbers can sometimes produce counterintuitive results. For example, int(-6.725/0.025) produces -268 rather than the correct -269; that's because it's really more like -268.99999999999994315658 instead
So, the machine representation of "15" is probably something like 14.9999999999999999 and, therefore, int truncates it to 14.
Note that perldoc suggests using the POSIX functions floor or ceil instead.
In a simple, one off, case adding 0.5 to your value before int-ing it will give you what you want.
e.g.
int(14.99 + 0.5)
15
it becomes 15.49 and is int-ed "down" to 15, whereas:
int( 14.45 + 0.5 )
still gets int'ed "down" to 14.0. This is a handy trick but doesn't self document as nicely as using floor and ceil.
As a side note, the Goldberg paper on floating point arithmetic always reminds me how useful it sometimes is to have brains that are not as mindlessly precise as a computer :-)
If I wanted to check if a number had more than two decimal places, I wouldn't do math on it.
my $more_than_two = $number =~ /\d+\.\d{2}\d+\z/;
Before I do that, I might use Scalar::Util's looks_like_a_number. This method will still fail with floating point squishiness if you were expecting 14.99999 to be 15.0.
However, you should tell us what you are trying to do instead of how you are trying to do that. It's easier to give better answers.
For your questions about int, I think it's documentation tell you what you need to know. The rest is answered in the first couple of questions in perlfaq4.

Need to add two values in perl

In if condition I used to take one value from log file after matching the particular pattern. That pattern is matched two times in log file. While matching the pattern first time that value is 0 and second time value is 48. It may be also reverse. First value may contain 48 and second value may contain 0. I need to calculate the exact value. So I planned to add these two values. but after adding these two values also while printing the total value in if condition I used to get the two values separately. But I need single value only.
Please give me solution to solve this issue.
Thanks in advance.
Do you mean something like this:
my $entry = "First is 10, seconds is 48";
if(my ($a,$b) = $entry =~ /(\d+)/g) {
print $a + $b,"\n"; # 58
}
But without actual code it is hard to see what your problem really is.

Is "map" a loop?

While answering this question, I came to realize that I was not sure whether Perl's map can be considered a loop or not?
On one hand, it quacks/walks like a loop (does O(n) work, can be easily re-written by an equivalent loop, and sort of fits the common definition = "a sequence of instructions that is continually repeated").
On the other hand, map is not usually listed among Perl's control structures, of which loops are a subset of. E.g. http://en.wikipedia.org/wiki/Perl_control_structures#Loops
So, what I'm looking for is a formal reason to be convinced of one side vs. the other. So far, the former (it is a loop) sounds a lot more convincing to me, but I'm bothered by the fact that I never saw "map" mentioned in a list of Perl loops.
map is a higher level concept than loops, borrowed from functional programming. It doesn't say "call this function on each of these items, one by one, from beginning to end," it says "call this function on all of these items." It might be implemented as a loop, but that's not the point -- it also might be implemented asynchronously -- it would still be map.
Additionally, it's not really a control structure in itself -- what if every perl function that used a loop in its implementation were listed under "loops?" Just because something is implemented using a loop, doesn't mean it should be considered its own type of loop.
No, it is not a loop, from my perspective.
Characteristic of (perl) loops is that they can be broken out of (last) or resumed (next, redo). map cannot:
map { last } qw(stack overflow); # ERROR! Can't "last" outside a loop block
The error message suggests that perl itself doesn't consider the evaluated block a loop block.
From an academic standpoint, a case can be made for both depending on how map is defined. If it always iterates in order, then a foreach loop could be emulated by map making the two equivalent. Some other definitions of map may allow out of order execution of the list for performance (dividing the work amongst threads or even separate computers). The same could be done with the foreach construct.
But as far as Perl 5 is concerned, map is always executed in order, making it equivalent to a loop. The internal structure of the expression map $_*2, 1, 2, 3 results in the following execution order opcodes which show that map is built internally as a while-like control structure:
OP enter
COP nextstate
OP pushmark
SVOP const IV 1
SVOP const IV 2
SVOP const IV 3
LISTOP mapstart
LOGOP (0x2f96150) mapwhile <-- while still has items, shift one off into $_
PADOP gvsv GV *_
SVOP const IV 2 loop body
BINOP multiply
goto LOGOP (0x2f96150) <-- jump back to the top of the loop
LISTOP leave
The map function is not a loop in Perl. This can be clearly seen by the failure of next, redo, and last inside a map:
perl -le '#a = map { next if $_ %2; } 1 .. 5; print for #a'
Can't "next" outside a loop block at -e line 1.
To achieve the desired affect in a map, you must return an empty list:
perl -le '#a = map { $_ %2 ? () : $_ } 1 .. 5; print for #a'
2
4
I think transformation is better name for constructs like map. It transforms one list into another. A similar function to map is List::Util::reduce, but instead of transforming a list into another list, it transforms a list into a scalar value. By using the word transformation, we can talk about the common aspects of these two higher order functions.
That said, it works by visiting every member of the list. This means it behaves much like a loop, and depending on what your definition of "a loop" is it might qualify. Note, my definition means that there is no loop in this code either:
#!/usr/bin/perl
use strict;
use warnings;
my $i = 0;
FOO:
print "hello world!\n";
goto FOO unless ++$i == 5;
Perl actually does define the word loop in its documentation:
loop
A construct that performs something repeatedly, like a roller
coaster.
By this definition, map is a loop because it preforms its block repeatedly; however, it also defines "loop control statement" and "loop label":
loop control statement
Any statement within the body of a loop that can make a loop
prematurely stop looping or skip an "iteration". Generally you
shouldn't try this on roller coasters.
loop label
A kind of key or name attached to a loop (or roller coaster) so
that loop control statements can talk about which loop they want to
control.
I believe it is imprecise to call map a loop because next and its kin are defined as loop control statements and they cannot control map.
This is all just playing with words though. Describing map as like-a-loop is a perfectly valid way of introducing someone to it. Even the documentation for map uses a foreach loop as part of its example:
%hash = map { get_a_key_for($_) => $_ } #array;
is just a funny way to write
%hash = ();
foreach (#array) {
$hash{get_a_key_for($_)} = $_;
}
It all depends on the context though. It is useful to describe multiplication to someone as repeated addition when you are trying to get him or her to understand the concept, but you wouldn't want him or her to continue to think of it that way. You would want him or her to learn the rules of multiplication instead of always translating back to the rules of addition.
Your question turns on the issue of classification. At least under one interpretation, asking whether map is a loop is like asking whether map is a subset of "Loop". Framed in this way, I think the answer is no. Although map and Loop have many things in common, there are important differences.
Loop controls: Chas. Owens makes a strong case that Perl loops are subject to loop controls like next and last, while map is not.
Return values: the purpose of map is its return value; with loops, not so much.
We encounter relationships like this all the time in the real world -- things that have much in common with each other, but with neither being a perfect subset of the other.
-----------------------------------------
|Things that iterate? |
| |
| ------------------ |
| |map() | |
| | | |
| | --------|---------- |
| | | | | |
| | | | | |
| ------------------ | |
| | | |
| | Loop| |
| ------------------ |
| |
-----------------------------------------
map is a higher-order function. The same applies to grep. Book Higher-Order Perl explains the idea in full details.
It's sad to see that discussion moved towards implementation details, not the concept.
FM's and Dave Sherohman's answers are quite good, but let me add an additional way of looking at map.
map is a function which is guaranteed to look at every element of a structure exactly once. And it is not a control structure, as it (itself) is a pure function. In other words, the invariants that map preserves are very strong, much stronger than 'a loop'. So if you can use a map, that's great, because you then get all these invariants 'for free', while if you're using a (more general!) control structure, you'll have to establish all these invariants yourself if you want to be sure your code is right.
And that's really the beauty of a lot of these higher-order functions: you get many more invariants for free, so that you as a programmer can spend your valuable thinking time maintaining application-dependent invariants instead of worrying about low-level implementation-dependent issues.
map itself is generally implemented using a loop of some sort (to loop over iterators, typically), but since it is a higher-level structure, it's often not included in lists of lower-level control structures.
Here is a definition of map as a recurrence:
sub _map (&#) {
my $f = shift;
return unless #_;
return $f->( local $_ = shift #_ ),
_map( $f, #_ );
}
my #squares = _map { $_ ** 2 } 1..100;
"Loop" is more of a CS term rather than a language-specific one. You can be reasonably confident in calling something a loop if it exhibits these characteristics:
iterates over elements
does the same thing every time
is O(n)
map fits these pretty closely, but it's not a loop because it's a higher-level abstraction. It's okay to say it has the properties of a loop, even if it itself isn't a loop in the strictest, lowest-level sense.
I think map fits the definition of a Functor.
It all depends on how you look at it...
On the one hand, Perl's map can be considered a loop, if only because that's how it's implemented in (current versions of) Perl.
On the other, though, I view it as a functional map and choose to use it accordingly which, among other things, includes only making the assumption that all elements of the list will be visited, but not making any assumptions about the order in which they will be visited. Aside from the degree of functional purity this brings and giving map a reason to exist and be used instead of for, this also leaves me in good shape if some future version of Perl provides a parallelizable implementation of map. (Not that I have any expectation of that ever happening...)
I think of map as more akin to an operator, like multiplication. You could even think of integer multiplication as a loop of additions :). It's not a loop of course, even if it were stupidly implemented that way. I see map similarly.
A map in Perl is a higher order function that applies a given function to all elements of an array and returns the modified array.
Whether this is implemented using an iterative loop or by recursion or any other way is not relevant and unspecified.
So a map is not a loop, though it may be implemented using a loop.
Map only looks like a loop if you ignore the lvalue. You can't do this with a for loop:
print join ' ', map { $_ * $_ } (1 .. 5)
1 4 9 16 25