RSA Prime Generation using Provable vs Probable Prime Construction - rsa

I am trying to implement RSA prime generation for P and Q based on FIP186-4 specification. The specification describes two different implementations: Section 3.2 Provable Prime Construction vs. Section 3.3 Probable Prime Construction. Initially, I tried implementing the probable prime approach because it is easier to understand and implement, but I discovered it is very slow because of the number of iterations needed to find P and Q primes (worst case it takes 15 minutes). Next, I decided to try the provable prime approach but I found out the algorithm is much more complex and might be slow as well. Below are my two issues:
In Section C.10, Step 12, how to eliminate the sqrt(2) to the expression x = floor(sqrt(2))(2^(L−1))) + (x mod (2^L − floor((sqrt(2)(2^(L−1))))) so that I can represent it as whole numbers using BigNum representation?
In Section C.10, Step 14, is there a fast way to compute y in the interval [1, p2] such that 0 = ( y p0 p1–1) mod p2? The specification doesn't specify a method to implement this. My initial thought was to perform a linear search staring from integer 1 and up but that can be very slow because p2 can be a very large number.
I tried searching online for help on this issue, but I discovered a lot of examples don't even comply with FIPS186-4. I assume it is because these two methods are too slow.

Related

Universal Hashing Integers

This is my first thread here and I would like to ask you a couple of questions for universal hashing of integers.
A universal hashing algorithm is supposed to use this:
equation =
((a*x+b)mod p) mod m
a=random number from 1 to p-1
b=random number from 0 to p-1
x= the Key
p= a prime number >=m
m=the size of the array
I know the numbers I am going to hash are on the range of 1-2969.
But I cannot understand how to use this equation in order to make as low collisions as possible.
At the time a and b are random I cannot do anything about it.
My question is how I am supposed to pick the prime if I have more than one choice, the range of primes I can use are from 2 to 4999.
I tried to pick the first available that corresponds the requirements for the function but sometimes it can return negative numbers. I have searched on Google and Stackoverflow but I could not figure out what I am not doing wrong.
I am coding in C. Also, I can use only universal hashing.
Thank your for your time.

network security- cryptography

I was solving a RSA problem and facing difficulty to compute d
plz help me with this
given p-971, q-52
Ø(n) - 506340
gcd(Ø(n),e) = 1 1< e < Ø(n)
therefore gcd(506340, 83) = 1
e= 83 .
e * d mod Ø(n) = 1
i want to compute d , i have all the info
can u help me how to computer d from this.
(83 * d) mod 506340 = 1
i am a little wean in maths so i am having difficulties finding d from the above equation.
Your value for q is not prime 52=2^2 * 13. Therefore you cannot find d because the maths for calculating this relies upon the fact the both p and q are prime.
I suggest working your way through the examples given here http://en.wikipedia.org/wiki/RSA_%28cryptosystem%29
Normally, I would hesitate to suggest a wikipedia link such as that, but I found it very useful as a preliminary source when doing a project on RSA as part of my degree.
You will need to be quite competent at modular arithmetic to get to grips with how RSA works. If you want to understand how to find d you will need to learn to find the Modular multiplicative inverse - just google this, I didn't come across anything incorrect when doing so myself.
Good luck.
A worked example
Let's take p=11, q=5. In reality you would use very large primes but we are going to be doing this by hand to we want smaller numbers. Keep both of these private.
Now we need n, which is given as n=pq and so in our case n=55. This needs to be made public.
The next item we need is the totient of n. This is simply phi(n)=(p-1)(q-1) so for our example phi(n)=40. Keep this private.
Now you calculate the encryption exponent, e. Defined such that 1<e<phi(n) and gcd(e,phi(n))=1. There are nearly always many possible different values of e - just pick one (in a real application your choice would be determined by additional factors - different choices of e make the algorithm easier/harder to crack). In this example we will choose e=7. This needs to be made public.
Finally, the last item to be calculated is d, the decryption exponent. To calculate d we must solve the equation ed mod phi(n) = 1. This is most commonly calculated using the Extended Euclidean Algorithm. This algorithm solves the equation phi(n)x+ed=1 subject to 1<d<phi(n), where x is an unknown multiplicative factor - which is identical to writing the previous equation without using mod. In our particular example, solving this leads to d=23. This should be kept private.
Then your public key is: n=55, e=7
and your private key is: n=55, d=23
To see the workthrough of the Extended Euclidean Algorithm check out this youtube video https://www.youtube.com/watch?v=kYasb426Yjk. The values used in that video are the same as the ones used here.
RSA is complicated and the mathematics gets very involved. Try solving a couple of examples with small values of p and q until you are comfortable with the method before attempting a problem with large values.

Random numbers that add to 1 with a minimum increment: Matlab

Having read carefully the previous question
Random numbers that add to 100: Matlab
I am struggling to solve a similar but slightly more complex problem.
I would like to create an array of n elements that sums to 1, however I want an added constraint that the minimum increment (or if you like number of significant figures) for each element is fixed.
For example if I want 10 numbers that sum to 1 without any constraint the following works perfectly:
num_stocks=10;
num_simulations=100000;
temp = [zeros(num_simulations,1),sort(rand(num_simulations,num_stocks-1),2),ones(num_simulations,1)];
weights = diff(temp,[],2);
I foolishly thought that by scaling this I could add the constraint as follows
num_stocks=10;
min_increment=0.001;
num_simulations=100000;
scaling=1/min_increment;
temp2 = [zeros(num_simulations,1),sort(round(rand(num_simulations,num_stocks-1)*scaling)/scaling,2),ones(num_simulations,1)];
weights2 = diff(temp2,[],2);
However though this works for small values of n & small values of increment, if for example n=1,000 & the increment is 0.1% then over a large number of trials the first and last numbers have a mean which is consistently below 0.1%.
I am sure there is a logical explanation/solution to this but I have been tearing my hair out to try & find it & wondered anybody would be so kind as to point me in the right direction. To put the problem into context create random stock portfolios (hence the sum to 1).
Thanks in advance
Thank you for the responses so far, just to clarify (as I think my initial question was perhaps badly phrased), it is the weights that have a fixed increment of 0.1% so 0%, 0.1%, 0.2% etc.
I did try using integers initially
num_stocks=1000;
min_increment=0.001;
num_simulations=100000;
scaling=1/min_increment;
temp = [zeros(num_simulations,1),sort(randi([0 scaling],num_simulations,num_stocks-1),2),ones(num_simulations,1)*scaling];
weights = (diff(temp,[],2)/scaling);
test=mean(weights);
but this was worse, the mean for the 1st & last weights is well below 0.1%.....
Edit to reflect excellent answer by Floris & clarify
The original code I was using to solve this problem (before finding this forum) was
function x = monkey_weights_original(simulations,stocks)
stockmatrix=1:stocks;
base_weight=1/stocks;
r=randi(stocks,stocks,simulations);
x=histc(r,stockmatrix)*base_weight;
end
This runs very fast, which was important considering I want to run a total of 10,000,000 simulations, 10,000 simulations on 1,000 stocks takes just over 2 seconds with a single core & I am running the whole code on an 8 core machine using the parallel toolbox.
It also gives exactly the distribution I was looking for in terms of means, and I think that it is just as likely to get a portfolio that is 100% in 1 stock as it is to geta portfolio that is 0.1% in every stock (though I'm happy to be corrected).
My issue issue is that although it works for 1,000 stocks & an increment of 0.1% and I guess it works for 100 stocks & an increment of 1%, as the number of stocks decreases then each pick becomes a very large percentage (in the extreme with 2 stocks you will always get a 50/50 portfolio).
In effect I think this solution is like the binomial solution Floris suggests (but more limited)
However my question has arrisen because I would like to make my approach more flexible & have the possibility of say 3 stocks & an increment of 1% which my current code will not handle correctly, hence how I stumbled accross the original question on stackoverflow
Floris's recursive approach will get to the right answer, but the speed will be a major issue considering the scale of the problem.
An example of the original research is here
http://www.huffingtonpost.com/2013/04/05/monkeys-stocks-study_n_3021285.html
I am currently working on extending it with more flexibility on portfolio weights & numbers of stock in the index, but it appears my programming & probability theory ability are a limiting factor.......
One problem I can see is that your formula allows for numbers to be zero - when the rounding operation results in two consecutive numbers to be the same after sorting. Not sure if you consider that a problem - but I suggest you think about it (it would mean your model portfolio has fewer than N stocks in it since the contribution of one of the stocks would be zero).
The other thing to note is that the probability of getting the extreme values in your distribution is half of what you want them to be: If you have uniformly distributed numbers from 0 to 1000, and you round them, the numbers that round to 0 were in the interval [0 0.5>; the ones that round to 1 came from [0.5 1.5> - twice as big. The last number (rounding to 1000) is again from a smaller interval: [999.5 1000]. Thus you will not get the first and last number as often as you think. If instead of round you use floor I think you will get the answer you expect.
EDIT
I thought about this some more, and came up with a slow but (I think) accurate method for doing this. The basic idea is this:
Think in terms of integers; rather than dividing the interval 0 - 1 in steps of 0.001, divide the interval 0 - 1000 in integer steps
If we try to divide N into m intervals, the mean size of a step should be N / m; but being integer, we would expect the intervals to be binomially distributed
This suggests an algorithm in which we choose the first interval as a binomially distributed variate with mean (N/m) - call the first value v1; then divide the remaining interval N - v1 into m-1 steps; we can do so recursively.
The following code implements this:
% random integers adding up to a definite sum
function r = randomInt(n, limit)
% returns an array of n random integers
% whose sum is limit
% calls itself recursively; slow but accurate
if n>1
v = binomialRandom(limit, 1 / n);
r = [v randomInt(n-1, limit - v)];
else
r = limit;
end
function b = binomialRandom(N, p)
b = sum(rand(1,N)<p); % slow but direct
To get 10000 instances, you run this as follows:
tic
portfolio = zeros(10000, 10);
for ii = 1:10000
portfolio(ii,:) = randomInt(10, 1000);
end
toc
This ran in 3.8 seconds on a modest machine (single thread) - of course the method for obtaining a binomially distributed random variate is the thing slowing it down; there are statistical toolboxes with more efficient functions but I don't have one. If you increase the granularity (for example, by setting limit=10000) it will slow down more since you increase the number of random number samples that are generated; with limit = 10000 the above loop took 13.3 seconds to complete.
As a test, I found mean(portfolio)' and std(portfolio)' as follows (with limit=1000):
100.20 9.446
99.90 9.547
100.09 9.456
100.00 9.548
100.01 9.356
100.00 9.484
99.69 9.639
100.06 9.493
99.94 9.599
100.11 9.453
This looks like a pretty convincing "flat" distribution to me. We would expect the numbers to be binomially distributed with a mean of 100, and standard deviation of sqrt(p*(1-p)*n). In this case, p=0.1 so we expect s = 9.4868. The values I actually got were again quite close.
I realize that this is inefficient for large values of limit, and I made no attempt at efficiency. I find that clarity trumps speed when you develop something new. But for instance you could pre-compute the cumulative binomial distributions for p=1./(1:10), then do a random lookup; but if you are just going to do this once, for 100,000 instances, it will run in under a minute; unless you intend to do it many times, I wouldn't bother. But if anyone wants to improve this code I'd be happy to hear from them.
Eventually I have solved this problem!
I found a paper by 2 academics at John Hopkins University "Sampling Uniformly From The Unit Simplex"
http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf
In the paper they outline how naive algorthms don't work, in a way very similar to woodchips answer to the Random numbers that add to 100 question. They then go on to show that the method suggested by David Schwartz can also be slightly biased and propose a modified algorithm which appear to work.
If you want x numbers that sum to y
Sample uniformly x-1 random numbers from the range 1 to x+y-1 without replacement
Sort them
Add a zero at the beginning & x+y at the end
difference them & subtract 1 from each value
If you want to scale them as I do, then divide by y
It took me a while to realise why this works when the original approach didn't and it come down to the probability of getting a zero weight (as highlighted by Floris in his answer). To get a zero weight in the original version for all but the 1st or last weights your random numbers had to have 2 values the same but for the 1st & last ones then a random number of zero or the maximum number would result in a zero weight which is more likely.
In the revised algorithm, zero & the maximum number are not in the set of random choices & a zero weight occurs only if you select two consecutive numbers which is equally likely for every position.
I coded it up in Matlab as follows
function weights = unbiased_monkey_weights(num_simulations,num_stocks,min_increment)
scaling=1/min_increment;
sample=NaN(num_simulations,num_stocks-1);
for i=1:num_simulations
allcomb=randperm(scaling+num_stocks-1);
sample(i,:)=allcomb(1:num_stocks-1);
end
temp = [zeros(num_simulations,1),sort(sample,2),ones(num_simulations,1)*(scaling+num_stocks)];
weights = (diff(temp,[],2)-1)/scaling;
end
Obviously the loop is a bit clunky and as I'm using the 2009 version the randperm function only allows you to generate permutations of the whole set, however despite this I can run 10,000 simulations for 1,000 numbers in 5 seconds on my clunky laptop which is fast enough.
The mean weights are now correct & as a quick test I replicated woodchips generating 3 numbers that sum to 1 with the minimum increment being 0.01% & it also look right
Thank you all for your help and I hope this solution is useful to somebody else in the future
The simple answer is to use the schemes that work well with NO minimum increment, then transform the problem. As always, be careful. Some methods do NOT yield uniform sets of numbers.
Thus, suppose I want 11 numbers that sum to 100, with a constraint of a minimum increment of 5. I would first find 11 numbers that sum to 45, with no lower bound on the samples (other than zero.) I could use a tool from the file exchange for this. Simplest is to simply sample 10 numbers in the interval [0,45]. Sort them, then find the differences.
X = diff([0,sort(rand(1,10)),1]*45);
The vector X is a sample of numbers that sums to 45. But the vector Y sums to 100, with a minimum value of 5.
Y = X + 5;
Of course, this is trivially vectorized if you wish to find multiple sets of numbers with the given constraint.

Dot Product: * Command vs. Loop gives different results

I have two vectors in Matlab, z and beta. Vector z is a 1x17:
1 0.430742139435890 0.257372971229541 0.0965909090909091 0.694329541928697 0 0.394960106863064 0 0.100000000000000 1 0.264704325268675 0.387774594078319 0.269207605609567 0.472226643323253 0.750000000000000 0.513121013402805 0.697062571025173
... and beta is a 17x1:
6.55269487769363e+26
0
0
-56.3867588816768
-2.21310778926413
0
57.0726052009847
0
3.47223691057151e+27
-1.00249317882651e+27
3.38202232046686
1.16425987969027
0.229504956512063
-0.314243264212449
-0.257394312588330
0.498644243389556
-0.852510642195370
I'm dealing with some singularity issues, and I noticed that if I want to compute the dot product of z*beta, I potentially get 2 different solutions. If I use the * command, z*beta = 18.5045. If I write a loop to compute the dot product (below), I get a solution of 0.7287.
summation=0;
for i=1:17
addition=z(1,i)*beta(i);
summation=summation+addition;
end
Any idea what's going on here?
Here's a link to the data: https://dl.dropboxusercontent.com/u/16594701/data.zip
The problem here is that addition of floating point numbers is not associative. When summing a sequence of numbers of comparable magnitude, this is not usually a problem. However, in your sequence, most numbers are around 1 or 10, while several entries have magnitude 10^26 or 10^27. Numerical problems are almost unavoidable in this situation.
The wikipedia page http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems shows a worked example where (a + b) + c is not equal to a + (b + c), i.e. demonstrating that the order in which you add up floating point numbers does matter.
I would guess that this is a homework assignment designed to illustrate these exact issues. If not, I'd ask what the data represents to suss out the appropriate approach. It would probably be much more productive to find out why such large numbers are being produced in the first place than trying to make sense of the dot product that includes them.

Problem with arithmetic using logarithms to avoid numerical underflow (take 2)

I have two lists of fractions;
say A = [ 1/212, 5/212, 3/212, ... ]
and B = [ 4/143, 7/143, 2/143, ... ].
If we define A' = a[0] * a[1] * a[2] * ... and B' = b[0] * b[1] * b[2] * ...
I want to calculate a normalised value of A' and B'
ie specifically the values of A' / (A'+B') and B' / (A'+B')
My trouble is A are B are both quite long and each value is small so calculating the product causes numerical underflow very quickly...
I understand turning the product into a sum through logarithms can help me determine which of A' or B' is greater
ie max( log(a[0])+log(a[1])+..., log(b[0])+log(b[1])+... )
and that using logs I can calculate the value of A' / B' but how do I do A' / A'+B'
My best bet to date is to keep the number representations as fractions, ie A = [ [1,212], [5,212], [3,212], ... ] and implement my own arithmetic but it's getting clumsy and I have a feeling there is a (simple) way of logarithms I'm just missing....
The numerators for A and B don't come from a sequence. They might as well be random for the purpose of this question. If it helps the denominators for all values in A are the same, as are all the denominators for B.
Any ideas most welcome!
( ps. I asked a similar question 24 hours ago regarding the ratio A'/B' but it was actually the wrong question to ask. I'm actually after A'/(A'+B'). Sorry, my mistake. )
I see few ways here
First of all you can notice that
A' / (A'+B') = 1 / (1 + B'/A')
and you know how to calculate B'/A' with logarithms.
Another way is to implement your own rational arithmetic but you don't need to go far about it. Since you know that denominators are the same for the whole array, it immediately gives you
numerator(A') = numerator(a[0]) * numerator(a[1]) ...
denumerator(A') = denumerator(a[0]) ^ A.length
All you now need to do is to sum A' and B' which is easy and then multiply A' and 1/(A'+B') which is also easy. The hardest part here is to normalize resulting value which is done with modulo operation and is trivial.
Alternatively, since you most likely using some popular scripting language, most of them have classes for rational arithmetic built in, Python and Ruby have them for sure.
My best bet to date is to keep the number representations as fractions and implement my own arithmetic but it's getting clumsy
What language are you using? If you can overload operators, it should be really easy to make up a Fraction class that you can treat as a number pretty much everywhere.
As an example, determining whether a fraction A / B is larger than C / D is basically comparing whether A * D is larger than B * C.
Both A and B have the same denominator in each fraction you mention. Is that true for every term in the list? If that's so, why don't you factor that out when you calculate the product? The denominator will simply be X^n, when X is the value and n is the number of terms in the list.
If you do that, you'll have the opposite problem: overflow in the numerator. You know that it can't be smaller than max(X)^n, where max(X) is the maximum value in the numerator and n is the number of terms in the list. If you can calculate that, you can see if your computer will have a problem. You can't put 10 pounds of anything in a 5 pound bag.
Unfortunately, the properties of logarithms limit you to the following simplifications:
(source: equationsheet.com)
and
(source: equationsheet.com)
So you're stuck with:
(source: equationsheet.com)
If you're using a language that supports infinite precision numbers (e.g., Java BigDecimal) it might make your life a little easier. But there's still a good argument for doing some thinking before you compute. Why use brute force when you can be elegant?
Well ... if you know A'(A'+B'), then B'(A'+B') should be one minus that. I personally would not use logarithms. I would use the actual fractions. I would also use some sort of BigInt class to represent the numerator and denominator. Which language are you using? Python can be a good fit.