How do I factor integers using Perl? - perl

I want split integers into their factors. For example, if the total number of records is:
169 - ( 13 x 13 times)
146 - ( 73 x 2 times)
150 - ( 50 x 3 times)
175 - ( 25 x 7 times)
168 - ( 84 x 2 )
160 - ( 80 x 2 times)
When it's more than 10k - I want everything on 1000
When it's more than 100k - I want everything on 10k
In this way I want to factor the number. How to achieve this? Is there any Perl module available for these kinds of number operations?
Suppose total number of records is 10k. It should be split by 1000x10 times only; not by 100 or 10s.
I can use sqrt function. But it's not always what I am expecting. If I give the input 146, I have to get (73, 2).

You can use the same algorithms you find for other languages in Perl. There isn't any Perl special magic in the ideas. It's just the implementation, and for something like this problem, it's probably going to look very similar to the implementation in any language.
What problem are you trying to solve? Maybe we can point you at the right algorithm if we know what you are trying to do:
Why must numbers over 10,000 use the 1,000 factor? Most numbers won't have a 1,000 factor.
Do you want all the factors, or just the largest and its companion?
What do you mean that the sqrt function doesn't work as you expect? If you're following the common algorithm, you just need to iterate up to the floor of the square root to test for factors. Most integers don't have an integral square root.

If the number is not a prime you can use a factoring algorithm.
There is an example of such a function here: http://www.classhelper.org/articles/perl-by-example-factoring-numbers/factoring-numbers-with-perl.shtml

Loop through some common numbers in an acceptable range (say, 9 to 15), compute the remainder modulo your test number, and choose the lowest.
sub compute_width {
my ($total_records) = #_;
my %remainders;
for(my $width = 9; $width <= 15; $width += 1) {
my $remainder = $total_records % $width;
$remainders{$width} = $remainder;
}
my #widths = sort {
$remainders{$a} <=> $remainders{$b} ||
$a <=> $b
} keys %remainders;
return $widths[0];
}

Related

Adding an array of floats produces weird sums adding forward vs. backward

I'm adding (summing) and array of floats in perl, and I was trying to speed it up. When I tried, I started getting weird results.
#!/usr/bin/perl
my $total = 0;
my $sum = 0;
# Compute $sum (adds from index 0 forward)
my #y = #{$$self{"closing"}}[-$periods..-1];
my #x = map {${$_}{$what}} #y;
# map { $sum += $_ } #x;
$sum += $_ for #x;
# Compute $total (adds from index -1 backward)
for (my $i = -1; $i >= -$periods; $i--) {
$total += ${${$$self{"closing"}}[$i]}{$what};
}
if($total != $sum) {
printf("SMA($what, $periods) total ($total) != sum ($sum) %g\n",
($total - $sum));
}
# Example output:
# SMA(close, 20) total (941.03) != sum (941.03) -2.27374e-13
I seem to get different answers when I compute $sum and $total.
The only thing I can think of is that one method adds forward through the array, and the other backward.
Would this cause them to overflow differently? I would expect so, but it never occurred to me that I would get different answers. Notice that the difference is small (-2.27374e-13).
Is this what's going on, or is my code busted?
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
As Eric mentioned in the comments, floating point arithmetic is not associative; so the order you do the operations will impact the answer.
While "add smaller values first" is good advice, it is important to emphasize that you can have differences even with just regular "small" values. Here's one example:
x = 1.004028
y = 3.0039678
z = 4.000855
If these are taken to be IEEE-754 single-precision floats (i.e., 32-bit binary format), then we get:
x + (y+z) = 8.008851
(x+y) + z = 8.00885
Infinitely precise result is 8.0088508. So neither are very good! And the error isn't insignificant for scientific computations and it accumulates.
This is a rich field with many numerical algorithms to ensure precision. While which one you pick entirely depends on your problem domain and particular needs and resources you have available, one of the best-known algorithms is Kahan's summation algorithm, see: https://en.wikipedia.org/wiki/Kahan_summation_algorithm. You can easily adopt it to your problem for (hopefully) better results.

How do I display a large number in scientific notation?

Using AutoIt, when I multiply 1 by 10^21, I get 1e+021. But in separate steps, such as multiplying 1 by 10^3 seven times, I get the overflow value of 3875820019684212736.
It appears AutoIt cannot handle numbers with more than eighteen digits. Is there a way around this? For example, can I multiply 10,000,000,000,000,000 by 1000 and have the result displayed as 1e+019?
Try this UDF : BigNum UDF
Example :
$X = "9999999999999999999999999999999"
$Y = "9999999999999999999999999999999"
$product = _BigNum_Mul($X, $Y)

arc4random() and arc4random_uniform() not really random?

I have been using arc4random() and arc4random_uniform() and I always had the feeling that they wasn't exactly random, for example, I was randomly choosing values from an Array but often the values that came out were the same when I generated them multiple times in a row, so today I thought that I would use an Xcode playground to see how these functions are behaving, so I first tests arc4random_uniform to generate a number between 0 and 4, so I used this algorithm :
import Cocoa
var number = 0
for i in 1...20 {
number = Int(arc4random_uniform(5))
}
And I ran it several times, and here is how to values are evolving most of the time :
So as you can see the values are increasing and decreasing repeatedly, and once the values are at the maximum/minimum, they often stay at it during a certain time (see the first screenshot at the 5th step, the value stays at 3 during 6 steps, the problem is that it isn't at all unusual, the function actually behaves in that way most of the time in my tests.
Now, if we look at arc4random(), it's basically the same :
So here are my questions :
Why is this function behaving in this way ?
How to make it more random ?
Thank you.
EDIT :
Finally, I made two experiments that were surprising, the first one with a real dice :
What surprised me is that I wouldn't have said that it was random, since I was seeing the same sort of pattern that as described as non-random for arc4random() & arc4random_uniform(), so as Jean-Baptiste Yunès pointed out, humans aren't good to see if a sequence of numbers is really random.
I also wanted to do a more "scientific" experiment, so I made this algorithm :
import Foundation
var appeared = [0,0,0,0,0,0,0,0,0,0,0]
var numberOfGenerations = 1000
for _ in 1...numberOfGenerations {
let randomNumber = Int(arc4random_uniform(11))
appeared[randomNumber]++
}
for (number,numberOfTimes) in enumerate(appeared) {
println("\(number) appeard \(numberOfTimes) times (\(Double(numberOfGenerations)/Double(numberOfTimes))%)")
}
To see how many times each number appeared, and effectively the numbers are randomly generated, for example, here is one output from the console :
0 appeared 99 times.
1 appeared 97 times.
2 appeared 78 times.
3 appeared 80 times.
4 appeared 87 times.
5 appeared 107 times.
6 appeared 86 times.
7 appeared 97 times.
8 appeared 100 times.
9 appeared 91 times.
10 appeared 78 times.
So it's definitely OK 😊
EDIT #2 : I made again the dice experiment with more rolls, and it's still as surprising to me :
A true random sequence of numbers cannot be generated by an algorithm. They can only produce pseudo-random sequence of numbers (something that looks like a random sequence). So depending on the algorithm chosen, the quality of the "randomness" may vary. The quality of arc4random() sequences is generally considered to have a good randomness.
You cannot analyze the randomness of a sequence visually... Humans are very bad to detect randomness! They tend to find some structure where there is no. Nothing really hurts in your diagrams (except the rare subsequence of 6 three in-a-row, but that is randomness, sometimes unusual things happens). You would be surprised if you had used a dice to generate a sequence and draw its graph. Beware that a sample of only 20 numbers cannot be seriously analyzed against its randomness, your need much bigger samples.
If you need some other kind of randomness, you can try to use /dev/random pseudo-file, which generate a random number each time you read in. The sequence is generated by a mix of algorithms and external physical events that ay happens in your computer.
It depends on what you mean when you say random.
As stated in the comments, true randomness is clumpy. Long strings of repeats or close values are expected.
If this doesn't fit your requirement, then you need to better define your requirement.
Other options could include using a shuffle algorithm to dis-order things in an array, or use an low-discrepancy sequence algorithm to give a equal distribution of values.
I don’t really agree with the idea of humans who are very bad to detect randomness.
Would you be satisfied if you obtain 1-1-2-2-3-3-4-4-5-5-6-6 after throwing 6 couples of dices ? however the dices frequencies are perfect…
This is exactly the problem i’m encountering with arc4random or arc4random_uniform functions.
I’m developing a backgammon application since many years which is based on a neural network trained by word champions players. I DO know that it plays much better than any one but many users think it is cheating. I also have doubts sometimes so I’ve decided to throw all dices by myself…
I’m not satisfied at all with arc4random, even if frequencies are OK.
I always throw a couple of dices and results lead to unacceptable situations, for example : getting five consecutive double dices for the same player, waiting 12 turns (24 dices) until the first 6 occurs.
It is easy to test (C code) :
void randomDices ( int * dice1, int * dice2, int player )
{
( * dice1 ) = arc4random_uniform ( 6 ) ;
( * dice2 ) = arc4random_uniform ( 6 ) ;
// Add to your statistics
[self didRandomDice1:( * dice1 ) dice2:( * dice2 ) forPlayer:player] ;
}
Maybe arc4random doesn’t like to be called twice during a short time…
So I’ve tried several solutions and finally choose this code which runs a second level of randomization after arc4random_uniform :
int CFRandomDice ()
{
int __result = -1 ;
BOOL __found = NO ;
while ( ! __found )
{
// random int big enough but not too big
int __bigint = arc4random_uniform ( 10000 ) ;
// Searching for the first character between '1' and '6'
// in the string version of bigint :
NSString * __bigString = #( __bigint ).stringValue ;
NSInteger __nbcar = __bigString.length ;
NSInteger __i = 0 ;
while ( ( __i < __nbcar ) && ( ! __found ) )
{
unichar __ch = [__bigString characterAtIndex:__i] ;
if ( ( __ch >= '1' ) && ( __ch <= '6' ) )
{
__found = YES ;
__result = __ch - '1' + 1 ;
}
else
{
__i++ ;
}
}
}
return ( __result ) ;
}
This code create a random number with arc4random_uniform ( 10000 ), convert it to string and then searches for the first digit between ‘1’ and ‘6’ in the string.
This appeared to me as a very good way to randomize the dices because :
1/ frequencies are OK (see the statistics hereunder) ;
2/ Exceptional dice sequences occur at exceptional times.
10000 dices test:
----------
Game Stats
----------
HIM :
Total 1 = 3297
Total 2 = 3378
Total 3 = 3303
Total 4 = 3365
Total 5 = 3386
Total 6 = 3271
----------
ME :
Total 1 = 3316
Total 2 = 3289
Total 3 = 3282
Total 4 = 3467
Total 5 = 3236
Total 6 = 3410
----------
HIM doubles = 1623
ME doubles = 1648
Now I’m sure that players won’t complain…

How can I implement the Gale-Shapley stable marriage algorithm in Perl?

Problem statement:
We have equal number of men and women. Each man has a preference score toward each woman. So do the woman for each man. Each of the men and women have certain interests. Based on the interest, we calculate the preference scores.
So initially, we have an input in a file having x columns. The first column is the person (man/woman) id. Ids are nothing but numbers from 0 ... n. (First half are men and next half women). The remaining x-1 columns will have the interests. These are integers too.
Now, using this n by x-1 matrix, we have come up with an n by n/2 matrix. The new matrix has all men and woman as their rows and scores for opposite sex in columns.
We have to sort the scores in descending order, also we need to know the id of person related to the scores after sorting.
So, here I wanted to use hash table.
Once we get the scores we need to make up pairs, for which we need to follow some rules.
My trouble is with the second matrix of n by n/2 that needs to give information of which man/woman has how much preference on a woman/man. I need these scores sorted so that I know who is the first preferred woman/man, 2nd preferred and so on for a man/woman.
I hope to get good suggestions on the data structures I use. I prefer PHP or Perl.
NB:
This is not homework. This is a little modified version of stable marriage algorithm. I have a working solution. I am only working on optimizing my code.
It is very similar to stable marriage problem but here we need to calculate the scores based on the interests they share. So, I have implemented it as the way you see in the wiki page http://en.wikipedia.org/wiki/Stable_marriage_problem.
My problem is not solving the problem. I solved it and can run it. I am just trying to have a better solution. So I am asking suggestions on the type of data structure to use.
Conceptually I tried using an array of hashes. where the array index give the person id and the hash in it gives the ids <=> scores in sorted manner. I initially start with an array of hashes. Now, I sort the hashes on values, but I could not store the sorted hashes back in an array. So just stored the keys after sorting and used these to get the values from my initial unsorted hashes.
Can we store the hashes after sorting?
Can you suggest a better structure?
I think the following implements the Gale-Shapley algorithm where each person's preference ordering is given as an array of scores over the members of the opposite sex.
As an aside, I just found out that David Gale passed away (see his Wikipedia entry — he will be missed).
The code is wordy, I just quickly transcribed the algorithm as described on Wikipedia and did not check original sources, but it should give you an idea of how to use appropriate Perl data structures. If the dimensions of the problem grow, profile first before trying to optimize.
I am not going to try to address the specific issues in your problem. In particular, you did not fully flesh out the idea of computing a match score based on interests and trying to guess is bound to be frustrating.
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my (%pref, %people, %proposed_by);
while ( my $line = <DATA> ) {
my ($sex, $id, #pref) = split ' ', $line;
last unless $sex and ($sex) =~ /^(m|w)\z/;
$pref{$sex}{$id} = [ map 0 + $_, #pref ];
$people{$sex}{$id} = undef;
}
while ( defined( my $man = bachelor($people{m}) ) ) {
my #women = eligible_women($people{w}, $proposed_by{$man});
next unless #women;
my $woman = argmax($pref{m}{$man}, \#women);
$proposed_by{$man}{$woman} = 1;
if ( defined ( my $jilted = $people{w}{$woman}{m} ) ) {
my $proposal_score = $pref{w}{$woman}[$man];
my $jilted_score = $pref{w}{$woman}[$jilted];
next if $proposal_score < $jilted_score;
$people{m}{$jilted}{w} = undef;
}
$people{m}{$man}{w} = $woman;
$people{w}{$woman}{m} = $man;
}
print Dump \%people;
sub argmax {
my ($pref, $candidates) = #_;
my ($ret) = sort { $pref->[$b] <=> $pref->[$a] } #$candidates;
return $ret;
}
sub bachelor {
my ($men) = #_;
my ($bachelor) = grep { not defined $men->{$_}{w} } keys %$men;
return $bachelor;
}
sub eligible_women {
my ($women, $proposed_to) = #_;
return grep { not defined $proposed_to->{$_} } keys %$women;
}
__DATA__
m 0 10 20 30 40 50
m 1 50 30 40 20 10
m 2 30 40 50 10 20
m 3 10 10 10 10 10
m 4 50 40 30 20 10
w 0 50 40 30 20 10
w 1 40 30 20 10 50
w 2 30 20 10 50 40
w 3 20 10 50 40 30
w 4 10 50 40 30 20

Generate a hash sum for several integers

I am facing the problem of having several integers, and I have to generate one using them. For example.
Int 1: 14
Int 2: 4
Int 3: 8
Int 4: 4
Hash Sum: 43
I have some restriction in the values, the maximum value that and attribute can have is 30, the addition of all of them is always 30. And the attributes are always positive.
The key is that I want to generate the same hash sum for similar integers, for example if I have the integers, 14, 4, 10, 2 then I want to generate the same hash sum, in the case above 43. But of course if the integers are very different (4, 4, 2, 20) then I should have a different hash sum. Also it needs to be fast.
Ideally I would like that the output of the hash sum is between 0 and 512, and it should evenly distributed. With my restrictions I can have around 5K different possibilities, so what I would like to have is around 10 per bucket.
I am sure there are many algorithms that do this, but I could not find a way of googling this thing. Can anyone please post an algorithm to do this?.
Some more information
The whole thing with this is that those integers are attributes for a function. I want to store the values of the function in a table, but I do not have enough memory to store all the different options. That is why I want to generalize between similar attributes.
The reason why 10, 5, 15 are totally different from 5, 10, 15, it is because if you imagine this in 3d then both points are a totally different point
Some more information 2
Some answers try to solve the problem using hashing. But I do not think this is so complex. Thanks to one of the comments I have realized that this is a clustering algorithm problem. If we have only 3 attributes and we imagine the problem in 3d, what I just need is divide the space in blocks.
In fact this can be solved with rules of this type
if (att[0] < 5 && att[1] < 5 && att[2] < 5 && att[3] < 5)
Block = 21
if ( (5 < att[0] < 10) && (5 < att[1] < 10) && (5 < att[2] < 10) && (5 < att[3] < 10))
Block = 45
The problem is that I need a fast and a general way to generate those ifs I cannot write all the possibilities.
The simple solution:
Convert the integers to strings separated by commas, and hash the resulting string using a common hashing algorithm (md5, sha, etc).
If you really want to roll-your-own, I would do something like:
Generate large prime P
Generate random numbers 0 < a[i] < P (for each dimension you have)
To generate hash, calculate: sum(a[i] * x[i]) mod P
Given the inputs a, b, c, and d, each ranging in value from 0 to 30 (5 bits), the following will produce an number in the range of 0 to 255 (8 bits).
bucket = ((a & 0x18) << 3) | ((b & 0x18) << 1) | ((c & 0x18) >> 1) | ((d & 0x18) >> 3)
Whether the general approach is appropriate depends on how the question is interpreted. The 3 least significant bits are dropped, grouping 0-7 in the same set, 8-15 in the next, and so forth.
0-7,0-7,0-7,0-7 -> bucket 0
0-7,0-7,0-7,8-15 -> bucket 1
0-7,0-7,0-7,16-23 -> bucket 2
...
24-30,24-30,24-30,24-30 -> bucket 255
Trivially tested with:
for (int a = 0; a <= 30; a++)
for (int b = 0; b <= 30; b++)
for (int c = 0; c <= 30; c++)
for (int d = 0; d <= 30; d++) {
int bucket = ((a & 0x18) << 3) |
((b & 0x18) << 1) |
((c & 0x18) >> 1) |
((d & 0x18) >> 3);
printf("%d, %d, %d, %d -> %d\n",
a, b, c, d, bucket);
}
You want a hash function that depends on the order of inputs and where similar sets of numbers will generate the same hash? That is, you want 50 5 5 10 and 5 5 10 50 to generate different values, but you want 52 7 4 12 to generate the same hash as 50 5 5 10? A simple way to do something like this is:
long hash = 13;
for (int i = 0; i < array.length; i++) {
hash = hash * 37 + array[i] / 5;
}
This is imperfect, but should give you an idea of one way to implement what you want. It will treat the values 50 - 54 as the same value, but it will treat 49 and 50 as different values.
If you want the hash to be independent of the order of the inputs (so the hash of 5 10 20 and 20 10 5 are the same) then one way to do this is to sort the array of integers into ascending order before applying the hash. Another way would be to replace
hash = hash * 37 + array[i] / 5;
with
hash += array[i] / 5;
EDIT: Taking into account your comments in response to this answer, it sounds like my attempt above may serve your needs well enough. It won't be ideal, nor perfect. If you need high performance you have some research and experimentation to do.
To summarize, order is important, so 5 10 20 differs from 20 10 5. Also, you would ideally store each "vector" separately in your hash table, but to handle space limitations you want to store some groups of values in one table entry.
An ideal hash function would return a number evenly spread across the possible values based on your table size. Doing this right depends on the expected size of your table and on the number of and expected maximum value of the input vector values. If you can have negative values as "coordinate" values then this may affect how you compute your hash. If, given your range of input values and the hash function chosen, your maximum hash value is less than your hash table size, then you need to change the hash function to generate a larger hash value.
You might want to try using vectors to describe each number set as the hash value.
EDIT:
Since you're not describing why you want to not run the function itself, I'm guessing it's long running. Since you haven't described the breadth of the argument set.
If every value is expected then a full lookup table in a database might be faster.
If you're expecting repeated calls with the same arguments and little overall variation, then you could look at memoizing so only the first run for a argument set is expensive, and each additional request is fast, with less memory usage.
You would need to define what you mean by "similar". Hashes are generally designed to create unique results from unique input.
One approach would be to normalize your input and then generate a hash from the results.
Generating the same hash sum is called a collision, and is a bad thing for a hash to have. It makes it less useful.
If you want similar values to give the same output, you can divide the input by however close you want them to count. If the order makes a difference, use a different divisor for each number. The following function does what you describe:
int SqueezedSum( int a, int b, int c, int d )
{
return (a/11) + (b/7) + (c/5) + (d/3);
}
This is not a hash, but does what you describe.
You want to look into geometric hashing. In "standard" hashing you want
a short key
inverse resistance
collision resistance
With geometric hashing you susbtitute number 3 with something whihch is almost opposite; namely close initial values give close hash values.
Another way to view my problem is using the multidimesional scaling (MS). In MS we start with a matrix of items and what we want is assign a location of each item to an N dimensional space. Reducing in this way the number of dimensions.
http://en.wikipedia.org/wiki/Multidimensional_scaling