Does a linear cryptographic hash function exist?
By linear I mean a function 'f' such that:
where + is mod n for some large constant n
Yes,the cryptographically strong SWIFFT algorithm (a variant was a condender for the SHA3 standard) is linear such that h(a + b) = h(a) + h(b)
It is an interesting example of a hash that is both cryptographically strong and not psuedorandom. It is also another unexpected use of the much lauded FFT algorithm.
http://en.wikipedia.org/wiki/SWIFFT
Related
I am attempting to fit the parameters of a deterministic volatility function for use in the practitioner Black Scholes model.
The formula for which I want to estimate the "a" parameters is:
sig = a0 + a1*K + a2*K^2 + a3*T + a4*T^2 + a5*KT
Where sig, K and T are known; I have multiple observations of K, T and sig combinations but only want a single set of "a" parameters.
How might I go about this? My google searches and own attempts all failed, unfortunately.
Thank you!
The function lsqcurvefit allows you to define the function that you want to fit. It should be straight forward from there on.
http://se.mathworks.com/help/optim/ug/lsqcurvefit.html
Some Mathematics
Notation stuff: index your observations by i and add an error term.
sig_i = a0 + a1*K_i + a2*K_i^2 + a3*T_i + a4*T_i^2 + a5*KT_i + e_i
Something probably not insane to do would be to minimize the square of the error term:
minimize (over a) \sum_i e_i^2
The solution to least squares is a simple linear algebra problem. (See https://stats.stackexchange.com/questions/186196/understanding-linear-algebra-in-ordinary-least-squares-derivation/186289#186289 for a solution if you really care.) (Further note: e_i is a linear function of a. I'm not sure why you would need lsqcurvefit as another answer suggested?)
Matlab Code for OLS (Ordinary Least Squares)
Assuming sig, K, T, and KT are n by 1 vectors
y = sig;
X = [ones(length(sig),1), K, K.^2, T, T.^2, KT];
a = X \ y; %basically computes a = inv(X'*X)*(X'*y) but in a better way
This an ordinary least squares regression of y on X.
Further Ideas
Depending on the distribution of your error terms, correlated error etc... regular OLS may be inefficient or possibly even inappropriate... I'm not familiar with the details of this problem to know. You may want to check what people do.
Eg. a technique that's less sensitive to big outliers is to minimize the absolute value of the error.
minimize (over a) \sum_i |a_i|
If you have a good, statistical model of how the data is generated you could do maximum likelihood estimation. Anyway... this rapidly devolve into a multi-quarter, statistics class.
I'm using multiplicative hashing with different values of M being random 64 bit primes. Should also use a multiplicative factor A? what are the best choices for A?
I want to calculate sha1 hash of a set (unordered list) of elements. I have already calculated sha1 hash of each element. I'm considering two solutions:
Sort elements by their hashes and calculate top hash of such list.
Treat element hashes as 160 bits integer values and XOR (bitwise operation) them together into one 160 bits hash.
Does second solution is weaker in terms of secure hash function properties? (pre-image resistance, second pre-image resistance, collision resistance).
Option 1 is what is done in ERS: that standard uses hash trees, where each node contains a hash value computed over the set of hash values from the child nodes; since order is not significant in the tree, the values are sorted lexicographically before hashing. This is good, and, as far as we know, safe.
Option 2 is very unsafe: if the hash function has 160-bit output, then I can easily generate 160 random inputs such that the corresponding hash values constitute a basis of the vector space GF(2)160, at which point I can produce a matching set for any aggregate hash value. Attack cost is negligible.
Option 3 suggested by #paj28 (sorting the values to hash, then hash them) is fine, too, as long as you "concatenate" the sorted values with an unambiguous separator. For instance, if you hash the set of strings containing "bar" and "foo", you don't want to obtain the same hash value as with the set of strings containing "ba" and "rfoo". It is easier to get something safe when all values to hash have the same length.
Therefore, use option 1: hash each value in the set, then sort the hash values in lexicographic order, and hash the sorted list of values again.
On the attack with option 2: this is linear algebra. Suppose that you have k vectors of n bits, such that none of them is equal to the XOR of some of the k-1 other vectors (they are said to be linearly independent). Then consider a new random vector v; the probability that this vector is equal to the XOR of some of the k vectors is equal to 2k-n, i.e. it is small as long as k < n. If the new vector v indeed linearly independent with the k vectors you already have (thus with probability 1-2k-n), then add it to the set: you now have k+1 linearly independent vectors.
Recurse: you will soon obtain n vectors of n bits which are linearly independent to each other. But you cannot go further, because probability of any new vector to be linearly independent from the n previous has dropped to 0. The n vectors are said to be a basis for the vector space.
In this case, the vectors are obtained by simply hashing values (random values, or values with structure, it does not matter much, because the hash function acts as a randomizer).
For a given set of k vectors, determining whether a new vector v is linearly independent with the k vectors is easy with Gaussian elimination. The same algorithm lets you know, once you have a basis, which of your n basis vectors shall be XORed together to yield any vector v'. In the setup of this question, this means that once I have produced n values mi such that the h(mi) constitute a basis, then for any target n-bit output t, I can use Gauss elimination to work out which of my h(mi) may be XORed together to yield exactly the value t. The corresponding mi values are then a preimage set for t.
The other option (3) is to sort the elements first, then combine them into a single string using a separator that cannot appear as part of an element.
Of these possibilities, 2 would concern me the most. I can't think now how you could attack it in a practical way, but it seems the riskiest.
So 1 and 3 are basically fine. But I would recommend 3 because you are using the hash in the way it is intended.
How does a constant before the key in the formula:
h(k) = (const * key) % m,
affect the distribution of the hash values in the table?
Are there any rules on how to choose such a constant to minimize collisions and get an even distribution of the keys in the hash table?
The constant factor should be prime, and if I remember correctly it should be relatively prime w.r.t. the modulus. This is all discussed at great length in Knuth Volume III.
Is there a hash function with following properties?
is associative
is not commutative
easily implementable on 32 bit integers: int32 hash(int32, int32)
If I am correct, such function allows achieving following goals
calculate hash of concatenated string from hashes of substrings
calculate hash concurrently
calculate hash of list implemented on binary tree - including order, but excluding how tree is balanced
The best I found so far is multiplication of 4x4 matrix of bits, but thats awkward to implement and reduces space to 16bits.
I am grateful for any help.
Polynomial rolling hash could help:
H(A1,...,An) = (H(A1,...,An-1) * Base + An) Mod P
It's easy to concat two results or substract prefix/suffix from result, as long as the length is known.
Matrix multiplication is associative and non-commutative.
You could try representing your hashes as matrices but this will result in a loss of information if they have 0 determinant (which is likely!).
So instead you should generate a triangle matrix with a diagonal of 1's to ensure that you have a determinant of 1 (this guarantees that composition does not loose information).
Furthermore the composition of triangle matrices produces a new triangle matrix, making reading the composition the same as generation.
Note: to use this method the length of your hash must be a triangle number!