Why does gcrypt say to recalculate the coefficient of an RSA key when converting from SSL format to gcrypt? - rsa

The documentation for libgcrypt says:
An RSA private key is described by this S-expression:
(private-key
(rsa
(n n-mpi)
(e e-mpi)
(d d-mpi)
(p p-mpi)
(q q-mpi)
(u u-mpi)))
...and...
p-mpi
RSA secret prime p.
q-mpi
RSA secret prime q with p < q.
u-mpi
Multiplicative inverse u = p^{-1} mod q.
...and...
Note that OpenSSL uses slighly different parameters: q < p and u = q^{-1} mod p.
To use these parameters you will need to swap the values and recompute u.
Here is example code to do this:
if (gcry_mpi_cmp (p, q) > 0)
{
gcry_mpi_swap (p, q);
gcry_mpi_invm (u, p, q);
}
If in one p is the smaller prime and in the other q is the smaller prime, and given that the two equations are identical save for exchanging p and q, is it really necessary to have to recompute u? Is it not sufficient just to exchange p and q?
As a side question, I am curious why gcrypt doesn't use the same values as the PKCS#1 encoding:
RSAPrivateKey ::= SEQUENCE {
version Version,
modulus INTEGER, -- n
publicExponent INTEGER, -- e
privateExponent INTEGER, -- d
prime1 INTEGER, -- p
prime2 INTEGER, -- q
exponent1 INTEGER, -- d mod (p-1)
exponent2 INTEGER, -- d mod (q-1)
coefficient INTEGER, -- (inverse of q) mod p
otherPrimeInfos OtherPrimeInfos OPTIONAL
}
o modulus is the RSA modulus n.
o publicExponent is the RSA public exponent e.
o privateExponent is the RSA private exponent d.
o prime1 is the prime factor p of n.
o prime2 is the prime factor q of n.
o exponent1 is d mod (p - 1).
o exponent2 is d mod (q - 1).
o coefficient is the CRT coefficient q^(-1) mod p.

The answer is that recalculating "u" is irrelevant. Simply swap the usage of "p" and "q" and it all works.
As a general comment on gcrypt, the asymmetric cryptography APIs are terrible. Truly abysmal.
There is no support for loading keys from file in ANY format.
There is no support to simply encrypt/decrypt a buffer. Instead you need to convert the buffer to an MPI before you can then convert it to an S-expression. After encryption you need to unwind the resulting S-expression to get the right piece and then call yet another function to get at the data itself. Decryption requires slightly more complexity in creating the S-expression to decrypt from a buffer, but retrieving the data is only one function call.
The parameters to the S-expression for the private key do not match the values in standard PKCS#1 format (although as covered by this question and answer, conversion is fairly easy). Why not?
During the course of my investigations I discovered that there is another GNU encryption library. Why they maintain two I have no idea. The other is called "nettle" and is much better:
*) It uses the GMP library for multi-precision integers, rather than having its own type as gcrypt does (mpi_t).
*) It supports loading keys from files in a variety of formats (I used it as the basis for my own code to load keys for use with gcrypt).
*) It supports conversion from various formats (PEM->DER, DER->Sexp).
*) It supports a variety of symmetric encryption algorithms and modes.
*) It supports asymmetric encryption/decryption/signing/verification.
I didn't actually use it so I cannot really comment on the APIs usability, but from what I saw it was generally much much better.
I don't really know the background on nettle, but I do wonder if it was created simply because gcrypt's API is so awful and they would rather start over than enhance gcrypt.

Related

Complexity in distributed system

if I have a network of n nodes with unique id's chosen from a range say 1 to n squared, then order of log n bits are sufficient and also necessary to address the nodes in the network.
Assume each node is given an ID in [1, n2], and that we have no control over the assignment of IDs. You agree that if we can encode any such ID in Θ(log n) bits, then your statement of sufficiency also follows.
Now, any integer x requires Θ(log x) bits in binary. Thus, writing our maximum integer n2 in binary requires Θ(log n2) bits. We notice that:
Θ(log n2)
= Θ(2 log n)      | properties of logarithms
= Θ(log n)         | properties of Θ
This shows the sufficiency. As for necessity, it follows more trivially from the properties of numeral systems.

How to perform induction over two inductive predicates?

I am starting with Coq and trying to formalize Automated Amortised Analysis. I am at lemma 4.13:
Lemma preservation : forall (Sigma : prog_sig) (Gamma : context) (p : program)
(s : stack) (h h' : heap) (e : expr) (c : type0) (v : val),
(* 4.1 *) has_type Sigma Gamma e c ->
(* 4.2 *) eval p s h e v h' ->
(* 4.3 *) mem_consistant_stack h s Gamma ->
(* 4.a *) mem_consistant h' v c /\ (* 4.b *) mem_consistant_stack h' s Gamma.
Proof.
intros Sigma Gamma p s h h' e c v WELL_TYPED EVAL.
The manual proof uses a double induction:
Proof. Note that claim (4.b) follows directly by Lemma 4.8 and Lemma
4.12. Each location l ∈ dom( H ) is either left unaltered or is overwritten by the value Bad and hence does not invalidate memory
consistency.
However, the first claim (4.a) requires a proof by
induction on the lengths of the derivations of (4.2) and (4.1) ordered
lexicographically with the derivation of the evaluation taking
priority over the typing derivation. This is required since an in-
duction on the length of the typing derivation alone would fail for
the case of function application: in order to allow recursive
functions, the type rule for application is a terminal rule relying on
the type given for the function in the program’s signature. However,
proving this case requires induction on the statement that the body of
the function is well-typed, which is most certainly a type derivation
of a longer length (i.e. longer than one step), prohibiting us from
using the induction hypothesis. Note in this particular case that the
length of the derivation for the evaluation statement does decrease.
An induction over the length of the derivation for premise (4.2) alone
fails similarly. Consider the last step in the derivation of premise
(4.1) being derived by the application of a structural rule, then the
length of the derivation for (4.2) remains exactly the same, while the
length of the derivation for premise (4.1) does decrease by one step.
Using induction on the lexicographically ordered lengths of the type
and evaluation derivations allows us to use the induction hypothesis
if either the length of the deriva- tion for premise (4.2) is
shortened or if the length of the derivation for premise (4.2) remains
unchanged while the length of the typing derivation is reduced. We
first treat the cases where the last step in the typing derivation was
obtained by application of a structural rule, which are all the cases
which leave the length of the derivation for the evaluation unchanged.
We then continue to consider the remaining cases based
604.3 Operational Semantics on the evaluation rule that had been applied last to derive premise (4.2), since the remaining type rules
are all syntax directed and thus unambiguously determined by the
applied evaluation rule.
How can such a "double induction" be performed in Coq?
I tried induction EVAL; induction WELL_TYPED, but got 418 subgoals, most of wich are unprovable.
I also tried to start with induction EVAL and use induction WELL_TYPED later, but go stucked in a similar situation.
I agree with #jbapple that a minimal example is better. That said, it may be that you are simply missing a concept of length of derivation. Note that the usual concept of proof by induction over a predicate actually implements something that is close to induction over the height of derivations, but not quite.
I propose that you exhibit two new predicates eval_n and
has_type_n that each express the same as eval
and has_type, but with an extra argument with meaning "... and derivation has size n". There are several ways in which size might be defined but I suspect that height will be enough for you.
Then you can prove
eval a1 .. ak <-> exists n, eval_n a1 .. ak n
and
has_type a1 .. ak <-> exists n, has_type a1 .. ak n
You should then be able to prove
forall p : nat * nat, forall a1 ... ak, eval_n a1 .. ak (fst p) ->
has_type_n a1 .. ak (snd p) -> YOUR GOAL
by well founded induction on pairs of natural numbers, using the lexical order construction of library Wellfounded (I suggest library Lexicographic_Product.v, it is a bit of an overkill for just pairs of natural numbers, but you only need to find the right instantiation).
This will be unwieldy because induction hypotheses will only refer to pairs
of numbers that are comparable for the lexical order, and you will have to perform inversions on the hypotheses concerning eval_n and has_type_n, but that should go through.
There probably exists a simpler solution, but for lack of more information from your side, I can only propose the big gun.

Is it possible to implement universal hashing for the complete range of integers?

I am reading about Universal hashing on integers. The prerequisite and mandatory precondition seems to be that we choose a prime number p greater than the set of all possible keys.
I am not clear on this point.
If our set of keys are of type int then this means that the prime number needs to be of the next bigger data type e.g. long.
But eventually whatever we get as the hash would need to be down-casted to an int to index the hash table. Doesn't this down-casting affect the quality of the Universal Hashing (I am referring to the distribution of the keys over the buckets) somehow?
If our set of keys are integers then this means that the prime number
needs to be of the next bigger data type e.g. long.
That is not a problem. Sometimes it is necessary otherwise the hash family cannot be universal. See below for more information.
But eventually whatever we get as the hash would need to be
down-casted to an int to index the hash table.
Doesn't this down-casting affect the quality of the Universal Hashing
(I am referring to the distribution of the keys over the buckets)
somehow?
The answer is no. I will try to explain.
Whether p has another data type or not is not important for the hash family to be universal. Important is that p is equal or larger than u (the maximum integer of the universe of integers). It is important that p is big enough (i.e. >= u).
A hash family is universal when the collision probability is equal or
smaller than 1/m.
So the idea is to hold that constraint.
The value of p, in theory, can be as big as a long or more. It just needs to be an integer and prime.
u is the size of the domain/universe (or the number of keys). Given the universe U = {0, ..., u-1}, u denotes the size |U|.
m is the number of bins or buckets
p is a prime which must be equal or greater than n
the hash family is defined as H = {h(a,b)(x)} with h(a,b)(x) = ((a * x + b) mod p) mod m. Note that a and b are randomly chosen integers (from all possible integers, so theoretically can be larger than p) modulo a prime p (which can make them either smaller or larger than m, the number of bins/buckets); but here too the data type (domain of values does not matter). See Hashing integers on Wikipedia for notation.
Follow the proof on Wikipedia and you conclude that the collision probability is _p/m_ * 1/(p-1) (the underscores mean to truncate the decimals). For p >> m (p considerably bigger than m) the probability tends to 1/m (but this does not mean that the probability would be better the larger p is).
In other terms answering your question: p being a bigger data type is not a problem here and can be even required. p has to be equal or greater than u and a and b have to be randomly chosen integers modulo p, no matter the number of buckets m. With these constraints you can construct a universal hash family.
Maybe a mathematical example could help
Let U be the universe of integers that correspond to unsigned char (in C for example). Then U = {0, ..., 255}
Let p be (next possible) prime equal or greater than 256. Note that p can be any of these types (short, int, long be it signed or unsigned). The point is that the data type does not play a role (In programming the type mainly denotes a domain of values.). Whether 257 is short, int or long doesn't really matter here for the sake of correctness of the mathematical proof. Also we could have chosen a larger p (i.e. a bigger data type); this does not change the proof's correctness.
The next possible prime number would be 257.
We say we have 25 buckets, i.e. m = 25. This means a hash family would be universal if the collision probability is equal or less than 1/25, i.e. approximately 0.04.
Put in the values for _p/m_ * 1/(p-1): _257/25_ * 1/256 = 10/256 = 0.0390625 which is smaller than 0.04. It is a universal hash family with the chosen parameters.
We could have chosen m = u = 256 buckets. Then we would have a collision probability of 0.003891050584, which is smaller than 1/256 = 0,00390625. Hash family is still universal.
Let's try with m being bigger than p, e.g. m = 300. Collision probability is 0, which is smaller than 1/300 ~= 0.003333333333. Trivial, we had more buckets than keys. Still universal, no collisions.
Implementation detail example
We have the following:
x of type int (an element of |U|)
a, b, p of type long
m we'll see later in the example
Choose p so that it is bigger than the max u (element of |U|), p is of type long.
Choose a and b (modulo p) randomly. They are of type long, but always < p.
For an x (of type int from U) calculate ((a*x+b) mod p). a*x is of type long, (a*x+b) is also of type long and so ((a*x+b) mod p is also of type long. Note that ((a*x+b) mod p)'s result is < p. Let's denote that result h_a_b(x).
h_a_b(x) is now taken modulo m, which means that at this step it depends on the data type of m whether there will be downcasting or not. However, it does not really matter. h_a_b(x) is < m, because we take it modulo m. Hence the value of h_a_b(x) modulo m fits into m's data type. In case it has to be downcasted there won't be a loss of value. And so you have mapped a key to a bin/bucket.

RSA-Calculate d without p and q

I am given a task to decrypt a message. However, i am only given the value of n and e. So, is it still possible to find the value of d? Is there any shortcut formula that can calculate d without knowing p and q?
The security of RSA is derived from the difficulty in calculating d from e and n (the public key). It sounds like the task you have been set is essentially to break RSA by factoring n into its prime factors p and q, and then using these to calculate d. Assuming n is not too large, factorization should be relatively easy (Wolfram|Alpha may be able to do it for example).

Generating k pairwise independent hash functions

I'm trying to implement a Count-Min Sketch algorithm in Scala, and so I need to generate k pairwise independent hash functions.
This is a lower-level than anything I've ever programmed before, and I don't know much about hash functions except from Algorithms classes, so my question is: how do I generate these k pairwise independent hash functions?
Am I supposed to use a hash function like MD5 or MurmurHash? Do I just generate k hash functions of the form f(x) = ax + b (mod p), where p is a prime and a and b are random integers? (i.e., the universal hashing family everyone learns in algorithms 101)
I'm looking more for simplicity than raw speed (e.g., I'll take something 5x slower if it's simpler to implement).
Scala already has MurmurHash implemented (it's scala.util.MurmurHash). It's very fast and very good at distributing values. A cryptographic hash is overkill--you'll just take tens or hundreds of times longer than you need to. Just pick k different seeds to start with and, since it's nearly cryptographic in quality, you'll get k largely independent hash codes. (In 2.10, you should probably switch to using scala.util.hashing.MurmurHash3; the usage is rather different but you can still do the same thing with mixing.)
If you only need near values to be mapped to randomly far values this will work; if you want to avoid collisions (i.e. if A and B collide using hash 1 they will probably not also collide using hash 2), then you'll need to go at least one more step and hash not the whole object but subcomponents of it so there's an opportunity for the hashes to start out different.
Probably the simplest approach is to take some cryptographic hash function and "seed" it with different sequences of bytes. For most practical purposes, the results should be independent, as this is one of the key properties a cryptographic hash function should have (if you replace any part of a message, the hash should be completely different).
I'd do something like:
// for each 0 <= i < k generate a sequence of random numbers
val randomSeeds: Array[Array[Byte]] = ... ; // initialize by random sequences
def hash(i: Int, value: Array[Byte]): Array[Byte] = {
val dg = java.security.MessageDigest.getInstance("SHA-1");
// "seed" the digest by a random value based on the index
dg.update(randomSeeds(i));
return dg.digest(value);
// if you need integer hash values, just take 4 bytes
// of the result and convert them to an int
}
Edit:
I don't know the precise requirements of the Count-Min Sketch, maybe a simple has function would suffice, but it doesn't seem to be the simplest solution.
I suggested a cryptographic hash function, because there you have quite strong guarantees that the resulting hash functions will be very different, and it's easy to implement, just use the standard libraries.
On the other hand, if you have two hash functions of the form f1(x) = ax + b (mod p) and f2(x) = cx + d (mod p), then you can compute one using another (without knowing x) using a simple linear formula f2(x) = c / a * (f1(x) - b) + d (mod p), which suggests that they aren't very independent. So you could run into unexpected problems here.