How many random strings does this code generate? - perl

I am considering this random string generator in perl:
sub generate_random_string {
my $length = 12;
my #chars = qw/2 3 4 5 6 7 8 9 A B C D E F G H J K M N P Q R S T U V W X Y Z/;
my $str = '';
$str .= $chars[int rand #chars] for 1..$length;
return $str;
}
How many unique strings will this generate? If I extend the length of the string, how many more unique strings are available?
Also, how do I calculate the probability of generating the same string twice (assuming the length of the string stays at 12)?

The answer is: (1/31) ^ 12
Or more generically: (1/(number of characters)) ^ length

Related

How to isolate leftmost bytes in integer

This has to be done in Perl:
I have integers on the order of e.g. 30_146_890_129 and 17_181_116_691 and 21_478_705_663.
These are supposedly made up of 6 bytes, where:
bytes 0-1 : value a
bytes 2-3 : value b
bytes 4-5 : value c
I want to isolate what value a is. How can I do this in Perl?
I've tried using the >> operator:
perl -e '$a = 330971351478 >> 16; print "$a\n";'
5050222
perl -e '$a = 17181116691 >> 16; print "$a\n";'
262163
But these numbers are not on the order of what I am expecting, more like 0-1000.
Bonus if I can also get values b and c but I don't really need those.
Thanks!
number >> 16 returns number shifted by 16 bit and not the shifted bits as you seem to assume. To get the last 16 bit you might for example use number % 2**16 or number & 0xffff. To get to b and c you can just shift before getting the last 16 bits, i.e.
$a = $number & 0xffff;
$b = ($number >> 16) & 0xffff;
$c = ($number >> 32) & 0xffff;
If you have 6 bytes, you don't need to convert them to a number first. You can use one the following depending on the order of the bytes: (Uppercase represents the most significant byte.)
my ($num_c, $num_b, $num_a) = unpack('nnn', "\xCC\xcc\xBB\xbb\xAA\xaa");
my ($num_a, $num_b, $num_c) = unpack('nnn', "\xAA\xaa\xBB\xbb\xAA\xaa");
my ($num_c, $num_b, $num_a) = unpack('vvv', "\xcc\xCC\xbb\xBB\xaa\xAA");
my ($num_a, $num_b, $num_c) = unpack('vvv', "\xaa\xAA\xbb\xBB\xcc\xCC");
If you are indeed provided with a number 0xCCccBBbbAAaa), you can convert it to bytes then extract the numbers you want from it as follows:
my ($num_c, $num_b, $num_a) = unpack('xxnnn', pack('Q>', $num));
Alternatively, you could also use an arithmetic approach like you attempted.
my $num_a = $num & 0xFFFF;
my $num_b = ( $num >> 16 ) & 0xFFFF;
my $num_c = $num >> 32;
While the previous two solutions required a Perl built to use 64-bit integers, the following will work with any build of Perl:
my $num_a = $num % 2**16;
my $num_b = ( $num / 2**16 ) % 2**16;
my $num_c = int( $num / 2**32 );
Let's look at ( $num >> 16 ) & 0xFFFF in detail.
Original number: 0x0000CCccBBbbAAaa
After shifting: 0x00000000CCccBBbb
After masking: 0x000000000000BBbb

Calculating Factorials using QBasic

I'm writing a program that calculates the Factorial of 5 numbers and output the results in a Tabular form but I keep getting Zeros.
Factorial Formula:. n! = nĂ—(n-1)!
I tried:
CLS
DIM arr(5) AS INTEGER
FOR x = 1 TO 5
INPUT "Enter Factors: ", n
NEXT x
f = 1
FOR i = 1 TO arr(n)
f = f * i
NEXT i
PRINT
PRINT "The factorial of input numbers are:";
PRINT
FOR x = 1 TO n
PRINT f(x)
NEXT x
END
and I'm expecting:
Numbers Factorrials
5 120
3 6
6 720
8 40320
4 24
You did some mistakes
FOR i = 1 TO arr(n)
where is n defined
you also never stored actual values into arr
PRINT f(x)
here you take from array f that is also not defined in your code
Possible solution to calculate arrays of factorials:
CLS
DIM arr(5) AS INTEGER
DIM ans(5) AS LONG
FOR x = 1 TO 5
INPUT "Enter Factors: ", arr(x)
f& = 1
FOR i = 1 TO arr(x)
f& = f& * i
NEXT i
ans(x) = f&
NEXT x
PRINT
PRINT "The factorial of input numbers are:";
PRINT
PRINT "Numbers", "Factorials"
FOR x = 1 TO 5
PRINT arr(x), ans(x)
NEXT x
END
I don't have a BASIC interpreter right in front of me, but I think this is what you're looking for:
CLS
DIM arr(5) AS INTEGER
DIM ans(5) AS LONG 'You need a separate array to store results in.
FOR x = 1 TO 5
INPUT "Enter Factors: ", arr(x)
NEXT x
FOR x = 1 to 5
f& = 1
FOR i = 1 TO arr(x)
f& = f& * i
NEXT i
ans(x) = f&
NEXT x
PRINT
PRINT "The factorial of input numbers are:";
PRINT
PRINT "Numbers", "Factorials"
FOR x = 1 TO 5
PRINT STR$(arr(x)), ans(x)
NEXT x
END
Just a comment though: In programming, you should avoid reusing variables unless you are short on memory. It can be done right, but it creates many opportunities for hard to find bugs in larger programs.
Possible solution to calculate arrays of factorials and square roots:
CLS
PRINT "Number of values";: INPUT n
DIM arr(n) AS INTEGER
DIM ans(n) AS LONG
FOR x = 1 TO n
PRINT "Enter value"; x;: INPUT arr(x)
f& = 1
FOR i = 1 TO arr(x)
f& = f& * i
NEXT i
ans(x) = f&
NEXT x
PRINT
PRINT "The factorial/square root of input numbers are:";
PRINT
PRINT "Number", "Factorial", "Squareroot"
FOR x = 1 TO n
PRINT arr(x), ans(x), SQR(arr(x))
NEXT x
END

Perl CPAN module fisher exact test

Is there any module in CPAN that can provide a method to compute the Fishers exact tests?
example in R:
in a 2x2 contingency table like:
17 12
8842559 10003821
fisher.test(matrix(data = c(17,8842559,12,10003821), nrow = 2))
Fisher's Exact Test for Count Data
data: counts
p-value = 0.2642
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.7213591 3.6778630
sample estimates:
odds ratio
1.602697
I used Text::NSP::Measures::2D::Fisher module, but I am not sure it does the same as above.
use Text::NSP::Measures::2D::Fisher::twotailed;
my $npp = 10003821;
my $n1p = 8842559;
my $np1 = 12;
my $n11 = 17;
my $twotailed_value = calculateStatistic(
n11 => $n11,
n1p => $n1p,
np1 => $np1,
npp => $npp,
);
if( (my $errorCode = getErrorCode()) ) {
print STDERR $errorCode, " - ", getErrorMessage();
} else {
print getStatisticName, "value for bigram is ", $twotailed_value, "\n";
}
but it does not give me anything
The matrix differ from R to perl. You can't use the same matrix !
for perl this is the matrix (into brackets):
word2 ~word2
word1 [n11] n12 | [n1p]
~word1 n21 n22 | n2p
--------------
[np1] np2 [npp]
For R this is the matrix (into brackets):
word2 ~word2
word1 [n11] [n12] | n1p
~word1 [n21] [n22] | n2p
--------------
np1 np2 npp

How to decrypt RSA text from p, q and d ('data greater than mod len' error)

I'm trying to decrypt some text using the p, q & d RSA parameters. This is my code:
use Crypt::OpenSSL::RSA;
my $cipher_text = '21822641296030233094227313655848509440605583067924377543599838215888039562622129112129822895080408267928468534668157995224253484645729278749085139763130764635317451011719149549123004731102607506049461610797920861018820451965633121194245016524243388070479379308761222809258576639629711274935572812821629596774863705897518352434753834386314245125246066390859225185066330366811073476496684635339997048026729834327425483254569562337608819782060439696539771993138092386150797070320410153423661265108318321693803297014167486821806691214248145774922909225478697375135263295662839076540338821496045675607198591575588621659609';
my ($p, $q, $d) = (
Crypt::OpenSSL::Bignum->new_from_decimal('165531801682935357262784768224825567629908164720968584885888440012850606062817307481747891600670103793664550471500745014914678541225436018211939431390053926336912952441897829541865006123774689488999658248640982182224222754377835611000656130261325362538051966725284846900143448968656908810497538272078057741753'),
Crypt::OpenSSL::Bignum->new_from_decimal('161793529444258956657578160951133315733795687396943555542529109270426552912409876020630999202216058708771830991232800413521618941159308875874915491167328976063871230426911602170436153334762815254160844789590951618176633523800724364347786188020741173210831867848084340389279221308498668063580976312456313708227'),
Crypt::OpenSSL::Bignum->new_from_decimal('4726230781685159301129128926091597612191418774972180765730674153946543720175721375641429858288249804644693058129864174539693448753576337835228363947222471089804797108134073771268482070990981157234925023770851307423738245681533737104667110764794379344770670194385194083716134044195705274587539907463141446593440244816853972305589231700346121402158165643863387848676660192091263041614047764528653983145902131144938355047165291147495652786645127063867131916536922764685613090037417336307735248968661966233168304037079723873034096551522712515691482108402916631034263410195810822874808411813091006049133015592459279891521'),
);
my $n = $p->mul($q, Crypt::OpenSSL::Bignum::CTX->new());
# I use d as e because e is mandatory and I don't have it. Later I'll use public_decrypt instead of decrypt.
my $rsa = Crypt::OpenSSL::RSA->new_key_from_parameters($n, $d, undef, $p, $q);
my $text = $rsa->public_decrypt($cipher_text);
But that gives me:
RSA.xs:202: OpenSSL error: data greater than mod len at test.pl line 14
Which I don't really know what means, data is not greater that d or n.
:-?
EDITED
The following python code works so I far as I know the keys and data are good.
from Crypto.PublicKey import RSA
p = 165531801682935357262784768224825567629908164720968584885888440012850606062817307481747891600670103793664550471500745014914678541225436018211939431390053926336912952441897829541865006123774689488999658248640982182224222754377835611000656130261325362538051966725284846900143448968656908810497538272078057741753
q = 161793529444258956657578160951133315733795687396943555542529109270426552912409876020630999202216058708771830991232800413521618941159308875874915491167328976063871230426911602170436153334762815254160844789590951618176633523800724364347786188020741173210831867848084340389279221308498668063580976312456313708227
d = 4726230781685159301129128926091597612191418774972180765730674153946543720175721375641429858288249804644693058129864174539693448753576337835228363947222471089804797108134073771268482070990981157234925023770851307423738245681533737104667110764794379344770670194385194083716134044195705274587539907463141446593440244816853972305589231700346121402158165643863387848676660192091263041614047764528653983145902131144938355047165291147495652786645127063867131916536922764685613090037417336307735248968661966233168304037079723873034096551522712515691482108402916631034263410195810822874808411813091006049133015592459279891521
n = p * q
cypher_text = 21822641296030233094227313655848509440605583067924377543599838215888039562622129112129822895080408267928468534668157995224253484645729278749085139763130764635317451011719149549123004731102607506049461610797920861018820451965633121194245016524243388070479379308761222809258576639629711274935572812821629596774863705897518352434753834386314245125246066390859225185066330366811073476496684635339997048026729834327425483254569562337608819782060439696539771993138092386150797070320410153423661265108318321693803297014167486821806691214248145774922909225478697375135263295662839076540338821496045675607198591575588621659609
decrypter = RSA.construct((n, 0L, d, p, q))
text = decrypter.key._decrypt(cypher_text)
print(text)
The error signifies that there's a mismatch between the key size and the data.
I'm not a python expert but IIRC it handles big numbers in core, and you've got cipher_text as a text string.
The cipher text looks like a decimal which we want as a binary string, so I think you want:
my $cipher_text = Crypt::OpenSSL::Bignum->new_from_decimal( '21822....' )->to_bin;
This turns out to be 256 bytes which sounds about right.
Now the error is:
RSA.xs:202: OpenSSL error: unknown padding type at rsa.pl line 20
I'm not sure about this, but I have to take the kids out now anyway :)

Randomly selecting letters by frequency of use

After feeding few Shakespeare books to my Perl script I have a hash with 26 english letters as keys and the number of their occurences in texts - as value:
%freq = (
a => 24645246,
b => 1409459,
....
z => 807451,
);
and of course the total number of all letters - let's say in the $total variable.
Is there please a nice trick to generate a string holding 16 random letters (a letter can occur several times there) - weighted by their frequency of use?
To be used in a word game similar to Ruzzle:
Something elegant - like picking a random line from a file, as suggested by a Perl Cookbook receipt:
rand($.) < 1 && ($line = $_) while <>;
The Perl Cookbook trick for picking a random line (which can also be found in perlfaq5) can be adapted for weighted sampling too:
my $chosen;
my $sum = 0;
foreach my $item (keys %freq) {
$sum += $freq{$item};
$chosen = $item if rand($sum) < $freq{$item};
}
Here, $sum corresponds to the line counter $. and $freq{$item} to the constant 1 in the Cookbook version.
If you're going to be picking a lot of weighted random samples, you can speed this up a bit with some preparation (note that this destroys %freq, so make a copy first if you want to keep it):
# first, scale all frequencies so that the average frequency is 1:
my $avg = 0;
$avg += $_ for values %freq;
$avg /= keys %freq;
$_ /= $avg for values %freq;
# now, prepare the array we'll need for fast weighted sampling:
my #lookup;
while (keys %freq) {
my ($lo, $hi) = (sort {$freq{$a} <=> $freq{$b}} keys %freq)[0, -1];
push #lookup, [$lo, $hi, $freq{$lo} + #lookup];
$freq{$hi} -= (1 - $freq{$lo});
delete $freq{$lo};
}
Now, to draw a random weighted sample from the prepared distribution, you just do this:
my $r = rand #lookup;
my ($lo, $hi, $threshold) = #{$lookup[$r]};
my $chosen = ($r < $threshold ? $lo : $hi);
(This is basically the Square Histogram method described in Marsaglia, Tsang & Wang (2004), "Fast Generation of Discrete Random Variables", J. Stat. Soft. 11(3) and originally due to A.J. Walker (1974).)
I have no clue about Perl syntax so I'll just write pseudo-code. You can do something like that
sum <= 0
foreach (letter in {a, z})
sum <= sum + freq[letter]
pick r, a random integer in [0, sum[
letter <= 'a' - 1
do
letter <= letter + 1
r <= r - freq(letter)
while r > 0
letter is the resulting value
The idea behind this code is to make a stack of boxes for each letter. The size of each box is the frequency of the letter. Then we choose a random location on this stack and see which letter's box we landed.
Example :
freq(a) = 5
freq(b) = 3
freq(c) = 3
sum = 11
| a | b | c |
- - - - - - - - - - -
When we choose a 0 <= r < 11, we have the following probabilities
Pick a 'a' = 5 / 11
Pick a 'b' = 3 / 11
Pick a 'c' = 3 / 11
Which is exactly what we want.
You can first built a table of the running sum of the frequency. So if you have the following data:
%freq = (
a => 15,
b => 25,
c => 30,
d => 20
);
the running sum would be;
%running_sums = (
a => 0,
b => 15,
c => 40, # 15 + 25
d => 70, # 15 + 25 + 30
);
$max_sum = 90; # 15 + 25 + 30 + 20
To pick a single letter with the weighted frequency, you need to select a number between [0,90), then you can do a linear search on the running_sum table for the range that includes the letter. For example, if your random number is 20 then the appropriate range is 15-40, which is for the letter 'b'. Using linear search gives a total running time of O(m*n) where m is the number of letters we need and n is the size of the alphabet (therefore m=16, n=26). This is essentially what #default locale do.
Instead of linear search, you can also do a binary search on the running_sum table to get the closest number rounded down. This gives a total running time of O(m*log(n)).
For picking m letters though, there is a faster way than O(m*log(n)), perticularly if n < m. First you generate m random numbers in sorted order (which can be done without sorting in O(n)) then you do a linear matching for the ranges between the list of sorted random numbers and the list of running sums. This gives a total runtime of O(m+n). The code in its entirety running in Ideone.
use List::Util qw(shuffle);
my %freq = (...);
# list of letters in sorted order, i.e. "a", "b", "c", ..., "x", "y", "z"
# sorting is O(n*log(n)) but it can be avoided if you already have
# a list of letters you're interested in using
my #letters = sort keys %freq;
# compute the running_sums table in O(n)
my $sum = 0;
my %running_sum;
for(#letters) {
$running_sum{$_} = $sum;
$sum += $freq{$_};
}
# generate a string with letters in $freq frequency in O(m)
my $curmax = 1;
my $curletter = $#letters;
my $i = 16; # the number of letters we want to generate
my #result;
while ($i > 0) {
# $curmax generates a uniformly distributed decreasing random number in [0,1)
# see http://repository.cmu.edu/cgi/viewcontent.cgi?article=3483&context=compsci
$curmax = $curmax * (1-rand())**(1. / $i);
# scale the random number $curmax to [0,$sum)
my $num = int ($curmax * $sum);
# find the range that includes $num
while ($num < $running_sum{$letters[$curletter]}) {
$curletter--;
}
push(#result, $letters[$curletter]);
$i--;
}
# since $result is sorted, you may want to use shuffle it first
# Fisher-Yates shuffle is O(m)
print "", join('', shuffle(#result));