perlre length limit - perl

From man perlre:
The "*" quantifier is equivalent to "{0,}", the "+" quantifier to "{1,}", and the "?" quantifier to "{0,1}". n and m are limited to integral values less than a preset limit defined when perl is built. This is usually 32766 on the most common platforms. The actual limit can be seen in the error message generated by code such as this:
$_ **= $_ , / {$_} / for 2 .. 42;
Ay that's ugly - Isn't there some constant I can get instead?
Edit: As daxim pointed out (and perlretut hints towards) it might be that 32767 is a magical hardcoded number. A little searching in the Perl code goes a long way, but I'm not sure how to get to the next step and actually find out where the default reg_infty or REG_INFTY is actually set:
~/dev/perl-5.12.2
$ grep -ri 'reg_infty.*=' *
regexec.c: if (max != REG_INFTY && ST.count == max)
t/re/pat.t: $::reg_infty = $Config {reg_infty} // 32767;
t/re/pat.t: $::reg_infty_m = $::reg_infty - 1;
t/re/pat.t: $::reg_infty_p = $::reg_infty + 1;
t/re/pat.t: $::reg_infty_m = $::reg_infty_m; # Surpress warning.
Edit 2: DVK is of course right: It's defined at compile time, and can probably be overridden only with REG_INFTY.

Summary: there are 3 ways I can think of to find the limit: empirical, "matching Perl tests" and "theoretical".
Empirical:
eval {$_ **= $_ , / {$_} / for 2 .. 129};
# To be truly portable, the above should ideally loop forever till $# is true.
$# =~ /bigger than (-?\d+) /;
print "LIMIT: $1\n"'
This seems obvious enough that it doesn't require explanation.
Matches Perl tests:
Perl has a series of tests for regex, some of which (in pat.t) deal with testing this max value. So, you can approximate that the max value computed in those tests is "good enough" and follow the test's logic:
use Config;
$reg_infty = $Config {reg_infty} // 2 ** 15 - 1; # 32767
print "Test-based reg_infinity limit: $reg_infty\n";
The explanation of where in the tests this is based off of is in below details.
Theoretical: This is attempting to replicate the EXACT logic used by C code to generate this value.
This is harder that it sounds, because it's affected by 2 things: Perl build configuration and a bunch of C #define statements with branching logic. I was able to delve fairly deeply into that logic, but was stalled on two problems: the #ifdefs reference a bunch of tokens that are NOT actually defined anywhere in Perl code that I can find - and I don't know how to find out from within Perl what those defines values were, and the ultimate default value (assuming I'm right and those #ifdefs always end up with the default) of #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0) (The actual limit is gotten by removing 1 bit off that resulting all-ones number - details below).
I'm also not sure how to access the amount of bytes in short from Perl for whichever implementation was used to build perl executable.
So, even if the answer to both those questions can be found (which I'm not sure of), the resulting logic would most certainly be "uglier" and more complex than the straightforward "empirical eval-based" one I offered as the first option.
Below I will provide the details of where various bits and pieces of logic related to to this limit live in Perl code, as well as my attempts to arrive at "Theoretically correct" solution matching C logic.
OK, here is some investigation part way, you can complete it yourself as I have ti run or I will complete later:
From regcomp.c: vFAIL2("Quantifier in {,} bigger than %d", REG_INFTY - 1);
So, the limit is obviously taken from REG_INFTY define. Which is declared in:
rehcomp.h:
/* XXX fix this description.
Impose a limit of REG_INFTY on various pattern matching operations
to limit stack growth and to avoid "infinite" recursions.
*/
/* The default size for REG_INFTY is I16_MAX, which is the same as
SHORT_MAX (see perl.h). Unfortunately I16 isn't necessarily 16 bits
(see handy.h). On the Cray C90, sizeof(short)==4 and hence I16_MAX is
((1<<31)-1), while on the Cray T90, sizeof(short)==8 and I16_MAX is
((1<<63)-1). To limit stack growth to reasonable sizes, supply a
smaller default.
--Andy Dougherty 11 June 1998
*/
#if SHORTSIZE > 2
# ifndef REG_INFTY
# define REG_INFTY ((1<<15)-1)
# endif
#endif
#ifndef REG_INFTY
# define REG_INFTY I16_MAX
#endif
Please note that SHORTSIZE is overridable via Config - I will leave details of that out but the logic will need to include $Config{shortsize} :)
From handy.h (this doesn't seem to be part of Perl source at first glance so it looks like an iffy step):
#if defined(UINT8_MAX) && defined(INT16_MAX) && defined(INT32_MAX)
#define I16_MAX INT16_MAX
#else
#define I16_MAX PERL_SHORT_MAX
I could not find ANY place which defined INT16_MAX at all :(
Someone help please!!!
PERL_SHORT_MAX is defined in perl.h:
#ifdef SHORT_MAX
# define PERL_SHORT_MAX ((short)SHORT_MAX)
#else
# ifdef MAXSHORT /* Often used in <values.h> */
# define PERL_SHORT_MAX ((short)MAXSHORT)
# else
# ifdef SHRT_MAX
# define PERL_SHORT_MAX ((short)SHRT_MAX)
# else
# define PERL_SHORT_MAX ((short) (PERL_USHORT_MAX >> 1))
# endif
# endif
#endif
I wasn't able to find any place which defined SHORT_MAX, MAXSHORT or SHRT_MAX so far. So the default of ((short) (PERL_USHORT_MAX >> 1)) it is assumed to be for now :)
PERL_USHORT_MAX is defined very similarly in perl.h, and again I couldn't find a trace of definition of USHORT_MAX/MAXUSHORT/USHRT_MAX.
Which seems to imply that it's set by default to: #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0). How to extract that value from Perl side, I have no clue - it's basically a number you get by bitwise negating a short 0, so if unsigned short is 16 bytes, then PERL_USHORT_MAX will be 16 ones, and PERL_SHORT_MAX will be 15 ones, e.g. 2^15-1, e.g. 32767.
Also, from t/re/pat.t (regex tests): $::reg_infty = $Config {reg_infty} // 32767; (to illustrate where the non-default compiled in value is stored).
So, to get your constant, you do:
use Config;
my $shortsize = $Config{shortsize} // 2;
$c_reg_infty = (defined $Config {reg_infty}) ? $Config {reg_infty}
: ($shortsize > 2) ? 2**16-1
: get_PERL_SHORT_MAX();
# Where get_PERL_SHORT_MAX() depends on logic for PERL_SHORT_MAX in perl.h
# which I'm not sure how to extract into Perl with any precision
# due to a bunch of never-seen "#define"s and unknown size of "short".
# You can probably do fairly well by simply returning 2**8-1 if shortsize==1
# and 2^^16-1 otherwise.
say "REAL reg_infinity based on C headers: $c_reg_infty";

Related

How to generate a good seed

I'm looking for a method to generate a good seed for generating different series of random numbers in processes that starts at the same time.
I would like to avoid using one of the math or crypto libraries because I'm picking random numbers very frequently and my cpu resources are very limited.
I found few example for setting seeds. I tested them using the following method:
short program that picks 100 random numbers out of 5000 options. So each value has 2% chance to be selected.
run this program 100 times, so in theory, in a truly random environment, all possible values should be picked at least once.
count the number of values that were not selected at all.
This is the perl code I used. In each test I opt in only one method for generating seed:
#!/usr/bin/perl
#$seed=5432;
#$seed=(time ^ $$);
#$seed=($$ ^ unpack "%L*", `ps axww | gzip -f`);
$seed=(time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
srand ($seed);
for ($i=0 ; $i< 100; $i++) {
printf ("%03d \n", rand (5000)+1000);
}
I ran the program 100 time and counted the values NOT selected using:
# run the program 100 times
for i in `seq 0 99`; do /tmp/rand_test.pl ; done > /tmp/list.txt
# test 1000 values (out of 5000). It should be good-enough representation.
for i in `seq 1000 1999`; do echo -n "$i "; grep -c $i /tmp/list.txt; done | grep " 0" | wc -l
The table shows the result of the tests (Lower value is better):
count Seed generation method
114 default - the line: "srand ($seed);" is commented ou
986 constant seed (5432)
122 time ^ $$
125 $$ ^ unpack "%L*", `ps axww | gzip -f`
163 time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`
The constant seed method showed 986 or 1000 values not selected. In other words, only 1.4% of the possible values were selected. This is close enough to the 2% that was expected.
However, I expected that the last option that was recommended in few places, would be significantly better than the default.
Is there any better method to generate a seed for each of the processes?
I'm picking random numbers very frequently and my cpu resources are very limited.
You're worrying before you even have made a measurement.
Is there any better method to generate a seed for each of the processes?
Yes. You have to leave user space which is prone to manipulation. Simply use Crypt::URandom.
It is safe for any purpose, including fetching a seed.
It will use the kernel CSPRNG for each operating system (see source code) and hence avoid the problems shown in the article above.
It does not suffer from the documented rand weakness.
Don't generate a seed. Let Perl do it for you. Don't call srand (or call it without a parameter if you do).
Quote srand,
If srand is not called explicitly, it is called implicitly without a parameter at the first use of the rand operator
and
When called with a parameter, srand uses that for the seed; otherwise it (semi-)randomly chooses a seed.
It doesn't simply use the time as the seed.
$ perl -M5.014 -E'say for srand, srand'
2665271449
1007037147
Your goal seems to be how to generate random numbers rather than how to generate seeds. In most cases, just use a cryptographic RNG (such as Crypt::URandom in Perl) to generate the random numbers you want, rather than generate seeds for another RNG. (In general, cryptographic RNGs take care of seeding and other issues for you.) You should not use a weaker RNG unless—
the random values you generate aren't involved in information security (e.g., the random values are neither passwords nor nonces nor encryption keys), and
either—
you care about repeatable "randomness" (which is not the case here), or
you have measured the performance of your application and find random number generation to be a performance bottleneck.
Since you will generate random names for the purpose of querying a database, which may be in a remote location, it will be highly unlikely that the random number generation itself will be the performance bottleneck.

Why are ##, #!, #, etc. not interpolated in strings?

First, please note that I ask this question out of curiosity, and I'm aware that using variable names like ## is probably not a good idea.
When using doubles quotes (or qq operator), scalars and arrays are interpolated :
$v = 5;
say "$v"; # prints: 5
$# = 6;
say "$#"; # prints: 6
#a = (1,2);
say "#a"; # prints: 1 2
Yet, with array names of the form #+special char like ##, #!, #,, #%, #; etc, the array isn't interpolated :
#; = (1,2);
say "#;"; # prints nothing
say #; ; # prints: 1 2
So here is my question : does anyone knows why such arrays aren't interpolated? Is it documented anywhere?
I couldn't find any information or documentation about that. There are too many articles/posts on google (or SO) about the basics of interpolation, so maybe the answer was just hidden in one of them, or at the 10th page of results..
If you wonder why I could need variable names like those :
The -n (and -p for that matter) flag adds a semicolon ; at the end of the code (I'm not sure it works on every version of perl though). So I can make this program perl -nE 'push#a,1;say"#a"}{say#a' shorter by doing instead perl -nE 'push#;,1;say"#;"}{say#', because that last ; convert say# to say#;. Well, actually I can't do that because #; isn't interpolated in double quotes. It won't be useful every day of course, but in some golfing challenges, why not!
It can be useful to obfuscate some code. (whether obfuscation is useful or not is another debate!)
Unfortunately I can't tell you why, but this restriction comes from code in toke.c that goes back to perl 5.000 (1994!). My best guess is that it's because Perl doesn't use any built-in array punctuation variables (except for #- and #+, added in 5.6 (2000)).
The code in S_scan_const only interprets # as the start of an array if the following character is
a word character (e.g. #x, #_, #1), or
a : (e.g. #::foo), or
a ' (e.g. #'foo (this is the old syntax for ::)), or
a { (e.g. #{foo}), or
a $ (e.g. #$foo), or
a + or - (the arrays #+ and #-), but not in regexes.
As you can see, the only punctuation arrays that are supported are #- and #+, and even then not inside a regex. Initially no punctuation arrays were supported; #- and #+ were special-cased in 2000. (The exception in regex patterns was added to make /[\c#-\c_]/ work; it used to interpolate #- first.)
There is a workaround: Because #{ is treated as the start of an array variable, the syntax "#{;}" works (but that doesn't help your golf code because it makes the code longer).
Perl's documentation says that the result is "not strictly predictable".
The following, from perldoc perlop (Perl 5.22.1), refers to interpolation of scalars. I presume it applies equally to arrays.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
"a $x -> {c}" really means:
"a " . $x . " -> {c}";
or:
"a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets. because the outcome may be determined by voting based on
heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.
Some things are just because "Larry coded it that way". Or as I used to say in class, "It works the way you think, provided you think like Larry thinks", sometimes adding "and it's my job to teach you how Larry thinks."

Can Perl detect if a floating point number has been implicitly rounded?

When I use the code:
(sub {
use strict;
use warnings;
print 0.49999999999999994;
})->();
Perl outputs "0.5".
And when I remove one "9" from the number:
(sub {
use strict;
use warnings;
print 0.4999999999999994;
})->();
It prints 0.499999999999999.
Only when I remove another 9, it actually stores the number precisely.
I know that floating point numbers are a can of worms nobody wants to deal with, but I am curious if there is a way in Perl to "trap" this implicit conversion and die, so that I can use eval to catch this die and let the user know that the number they are trying to pass is not supported by Perl in its' native form(So the user can maybe pass a string or an object instead).
The reason why I need this is to avoid a situations like passing 0.49999999999999994 to be rounded by my function, but the number gets converted to 0.5, and in turn gets rounded to 1 instead of 0. I am not sure how to "intercept" this conversion so that my function "knows" that it did not actually get 0.5 as input, but that the user's input was intercepted.
Without knowing how to intercept this kind of conversion, I cannot trust "round" because I do not know whether it received my input as I sent it, or if that input has been modified(at compile time or runtime, not sure) before the function was called(and in turn, the function has no idea if the input it is operating on is the input the user intended or not and has no means to warn the user).
This is not a Perl unique problem, it happens in JavaScript:
(() => {
'use strict';
/* oops: 1 */
console.log(Math.round(0.49999999999999999))
})();
It happens in Ruby:
(Proc.new {
# oops: 1
print (0.49999999999999999.round)
}).call()
It happens in PHP:
<?php
(call_user_func(function() {
/* oops: 1 */
echo round(0.49999999999999999);
}));
?>
it even happens in C(which is okay to happen, but my gcc does not warn me that the number has not been stored precisely(when specifying specific floating point literals, they had better be stored exactly, or the compiler should warn you that it decided to turn it into another form(e.g. "Your number x cannot be represented in 64 bit/32 bit floating point form, so I converted it to y." ) so you can see if that's okay or not, in this case it is NOT)):
#include <math.h>
#include <stdio.h>
int main(int argc, char **argv)
{
/* oops: 1 */
printf("%f.\n", round(0.49999999999999999));
return 0;
}
Summary:
Is it possible to make Perl show error or warning on implicit conversions of floating numbers, or is this something that Perl5(along with other languages) are incapable of doing at this moment(e.g. The compiler does not go out of its' way to support such warnings/offer a flag to enable such warnings)?
e.g.
warning: the number 0.49999999999999994 is not representable, it has been converted to 0.5. using bigint might solve this. Consider reducing precision of the number.
Perhaps use BigNum:
$ perl -Mbignum -le 'print 0.49999999999999994'
0.49999999999999994
$ perl -Mbignum -le 'print 0.49999999999999994+0.1'
0.59999999999999994
$ perl -Mbignum -le 'print 0.49999999999999994-0.1'
0.39999999999999994
$ perl -Mbignum -le 'print 0.49999999999999994+10.1'
10.59999999999999994
It transparently extends precision of Perl floating point and ints to extended precision.
be aware that bignum is 150 times slower than internal and other math solutions, and will typicaly NOT solve your problem (as soon as you need to store your numbers in JSON or databases or whatever, you're back at the same problem again).
Typically, sprintf takes care of prettying your output for you, so you do not have to see the ugly imprecision, however, it's still there.
Here is an example which works on my x64 platform which understands how to deal with that imprecision.
This correctly tells you if the 2 numbers you're interested in are the same:
sub safe_eq {
my($var1,$var2)=#_;
return 1 if($var1==$var2);
my $dust;
if($var2==0) { $dust=abs($var1); }
else { $dust= abs(($var1/$var2)-1); }
return 0 if($dust>5.32907051820076e-15 ); # dust <= 5.32907051820075e-15
return 1;
}
You can build on top of this to solve all your problems.
It works by understanding the magnitude of the imprecision in your native numbers, and accommodating it.
As you said in the question, dealing with floating-point numbers in code is quite the can of worms, precisely because the standard floating-point representation, regardless of the precision employed, is incapable of accurately representing many decimal numbers. The only 100% reliable way around this is to not use floating-point numbers.
The easiest way to apply that is to instead use fixed-point numbers, although that limits precision to a fixed number of decimal places. e.g., Instead of storing 10.0050, define a convention that all numbers are stored to 4 decimal places and store 100050 instead.
But that doesn't seem likely to satisfy you, based on the minimal explanation you've given for what you're actually trying to accomplish (building a general-purpose math library). The next option, then, would be to store the number of decimal places as a scaling factor with each value. So 10.0050 would become an object containing the data { value => 100050, scale => 4 }.
This can then be extended into a more general "rational number" data type by effectively storing each number as a numerator and denominator, thus allowing you to precisely store numbers such as 1/3, which neither base 2 nor base 10 can represent exactly. This is, incidentally, the approach that I am told Perl 6 has taken. So, if switching to Perl 6 is an option, then you may find that it all Just Works for you once you do so.

How do I run the same command multiple times using Perl?

I have 2 commands that I need to run back to back 16 times for 2 sets of data. I have labeled the files used as file#a1_100.gen (set 1) and file#a2_100.gen (set 2). The 100 is then replaced by multiples of 100 upto 1600 (100,200,...,1000,...,1600).
Example 1: For first set
Command 1: perl myprogram1.pl file#a1.pos abc#a1.ref xyz#a1.ref file#a1_100.gen file#a1_100.out
Command 2: perl my program2.pl file#a1_100.out file#a1_100.out.long
Example 2: For first set
Command 1: perl myprogram1.pl file#a1.pos abc#a1.ref xyz#a1.ref file#a1_200.gen file#a1_200.out
Command 2: perl my program2.pl file#a1_200.out file#a1_200.out.long
These 2 commands are repeated 16 times for both set 1 and set 2. For set 2 the filename changes to File#a2...
I need a command that will run this on its own by changing the filename for the 2 sets, running it 16 times for each set.
Any help will be greatly appreciated! Thanks!
This is probably most easily done with a shell script. As with Perl, TMTOWTDI — there's more than one way to do it.
for num in $(seq 1 16)
do
perl myprogram1.pl file#a1.pos abc#a1.ref xyz#a1.ref file#a1_${num}00.gen file#a1_${num}00.out
perl myprogram2.pl file#a1_${num}00.out file#a1_${num}00.out.long
done
(You could use {1..16} in place of $(seq 1 16) to generate the numbers. You might also note that the # characters in the file names discombobulate the SO Markdown system.)
Or you could use:
for num in $(seq 100 100 1600)
do
perl myprogram1.pl file#a1.pos abc#a1.ref xyz#a1.ref file#a1_${num}.gen file#a1_${num}.out
perl myprogram2.pl file#a1_${num}.out file#a1_${num}.out.long
done
(I don't think there's a {...} expansion for that.)
Or, better, you could use variables to hold values to avoid repetition:
POS="file#a1.pos"
ABC="abc#a1.ref"
XYZ="xyz#a1.ref"
for num in $(seq 100 100 1600)
do
PFX="file#a1_${num}"
GEN="${PFX}.gen"
OUT="${PFX}.out"
LONG="${OUT}.long"
perl myprogram1.pl "${POS}" "${ABC}" "${XYZ}" "${GEN}" "${OUT}"
perl myprogram2.pl "${OUT}" "${LONG}"
done
In this code, the braces around the parameter names are all optional; in the first block of code, the braces around ${num} were mandatory, but optional in the second set. Enclosing names in double quotes is also optional here, but recommended.
Or, if you must do it in Perl, then:
use warnings;
use strict;
my $POS = "file#a1.ref";
my $ABC = "abc#a1.ref";
my $XYZ = "xyz#a1.ref";
for (my $num = 100; $num <= 1600; $num += 100)
{
my $PFX = "file#a1_${num}";
my $GEN = "${PFX}.gen";
my $OUT = "${PFX}.out";
my $LONG = "${OUT}.long";
system("perl", "myprogram1.pl", "${POS}", "${ABC}", "${XYZ}", "${GEN}", "${OUT}");
system("perl", "myprogram2.pl", "${OUT}", "${LONG}");
}
This is all pretty basic coding. And you can guess that it didn't take me long to generate this from the last shell script. Note the use of multiple separate strings instead on one long string in the system calls. That avoids running a shell interpreter — Perl runs perl directly.
You could use $^X instead of "perl" to ensure that you run the same Perl executable as ran the script shown. (If you have /usr/bin/perl on your PATH but you run $HOME/perl/v5.20.1/bin/perl thescript.pl, the difference might matter, but probably wouldn't.)

Find combinations of numbers that sum to some desired number

I need an algorithm that identifies all possible combinations of a set of numbers that sum to some other number.
For example, given the set {2,3,4,7}, I need to know all possible subsets that sum to x. If x == 12, the answer is {2,3,7}; if x ==7 the answer is {{3,4},{7}} (ie, two possible answers); and if x==8 there is no answer. Note that, as these example imply, numbers in the set cannot be reused.
This question was asked on this site a couple years ago but the answer is in C# and I need to do it in Perl and don't know enough to translate the answer.
I know that this problem is hard (see other post for discussion), but I just need a brute-force solution because I am dealing with fairly small sets.
sub Solve
{
my ($goal, $elements) = #_;
# For extra speed, you can remove this next line
# if #$elements is guaranteed to be already sorted:
$elements = [ sort { $a <=> $b } #$elements ];
my (#results, $RecursiveSolve, $nextValue);
$RecursiveSolve = sub {
my ($currentGoal, $included, $index) = #_;
for ( ; $index < #$elements; ++$index) {
$nextValue = $elements->[$index];
# Since elements are sorted, there's no point in trying a
# non-final element unless it's less than goal/2:
if ($currentGoal > 2 * $nextValue) {
$RecursiveSolve->($currentGoal - $nextValue,
[ #$included, $nextValue ],
$index + 1);
} else {
push #results, [ #$included, $nextValue ]
if $currentGoal == $nextValue;
return if $nextValue >= $currentGoal;
}
} # end for
}; # end $RecursiveSolve
$RecursiveSolve->($goal, [], 0);
undef $RecursiveSolve; # Avoid memory leak from circular reference
return #results;
} # end Solve
my #results = Solve(7, [2,3,4,7]);
print "#$_\n" for #results;
This started as a fairly direct translation of the C# version from the question you linked, but I simplified it a bit (and now a bit more, and also removed some unnecessary variable allocations, added some optimizations based on the list of elements being sorted, and rearranged the conditions to be slightly more efficient).
I've also now added another significant optimization. When considering whether to try using an element that doesn't complete the sum, there's no point if the element is greater than or equal to half the current goal. (The next number we add will be even bigger.) Depending on the set you're trying, this can short-circuit quite a bit more. (You could also try adding the next element instead of multiplying by 2, but then you have to worry about running off the end of the list.)
The rough algorithm is as follows:
have a "solve" function that takes in a list of numbers already included and a list of those not yet included.
This function will loop through all the numbers not yet included.
If adding that number in hits the goal then record that set of numbers and move on,
if it is less than the target recursively call the function with the included/exluded lists modified with the number you are looking at.
else just go to the next step in the loop (since if you are over there is no point trying to add more numbers unless you allow negative ones)
You call this function initially with your included list empty and your yet to be included list with your full list of numbers.
There are optimisations you can do with this such as passing the sum around rather than recalculating each time. Also if you sort your list initially you can do optimisations based on the fact that if adding number k in the list makes you go over target then adding k+1 will also send you over target.
Hopefully that will give you a good enough start. My perl is unfortuantely quite rusty.
Pretty much though this is a brute force algorithm with a few shortcuts in it so its never going to be that efficient.
You can make use of the Data::PowerSet module which generates all subsets of a list of elements:
Use Algorithm::Combinatorics. That way, you can decide ahead of time what size subsets you want to consider and keep memory use to a minimum. Apply some heuristics to return early.
#!/usr/bin/perl
use strict; use warnings;
use List::Util qw( sum );
use Algorithm::Combinatorics qw( combinations );
my #x = (1 .. 10);
my $target_sum = 12;
{
use integer;
for my $n ( 1 .. #x ) {
my $iter = combinations(\#x, $n);
while ( my $set = $iter->next ) {
print "#$set\n" if $target_sum == sum #$set;
}
}
}
The numbers do blow up fairly rapidly: It would take thousands of days to go through all subsets of a 40 element set. So, you should decide on the interesting sizes of subsets.
Is this a 'do my homework for me' question?
To do this deterministically would need an algorithm of order N! (i.e. (N-0) * (N-1) * (N-2)...) which is going to be very slow with large sets of inputs. But the algorithm is very simple: work out each possible sequence of the inputs in the set and try adding up the inputs in the sequence. If at any point the sum matches, you've got one of the answers, save the result and move on to the next sequence. If at any point the sum is greater than the target, abandon the current sequence and move on to the next.
You could optimize this a little by deleting any of the inputs greater than the target. Another approach for optimization would be to to take the first input I in the sequence and create a new sequence S1, deduct I from the target T to get a new target T1, then check if T exists in S1, if it does then you've got a match, otherwise repeat the process with S1 and T1. The order is still N! though.
If you needed to do this with a very large set of numbers then I'd suggest reading up on genetic algorithms.
C.
Someone posted a similar question a while ago and another person showed a neat shell trick to answer it. Here is a shell technique, but I don't think it is as neat a solution as the one I saw before (so I'm not taking credit for this approach). It's cute because it takes advantage of shell expansion:
for i in 0{,+2}{,+3}{,+4}{,+7}; do
y=$(( $i )); # evaluate expression
if [ $y -eq 7 ]; then
echo $i = $y;
fi;
done
Outputs:
0+7 = 7
0+3+4 = 7