why emBits = modBits — 1 in RSASSA-PSS - hash

Standard(PKCS#1) says that length of message used to sign must be emBits = modBits — 1. But where is it from? I mean in this standard signature is based on a hash and hash is supplemented to a length emBits. But why is it must be modBits — 1? To create a digital signature of the right size?

Let's pretend you have a modulus value of 0b1010111111 (10 bits).
If you run EMSA-PSS-Encode(M, 10) it could (if capable of producing numbers that small) produce 0b1111001011. That value exceeds the modulus, so it's mathematically equivalent to 0b100001100. When running verify you get the intermediate value of 0b100001100, then find that your signature fails to verify. You sign it again, this time it works. Confusion abounds.
The answer, ultimately, is "to have a stable algorithm which gets as close to the modulus value as possible without exceeding it". Similarly, EMSA-PKCS1-v1_5 starts with a zero-byte, to ensure the modulus is always a larger number than the encoded value.

Related

Using brute force to determine original input data from a hash value

What is Hashing - from blockgeeks.com
I came across this article explaining that to determine the original input from a 128-bit hash value, using brute force, the worst case scenario would be 2^128 – 1 attempts (meaning there are 2^128 possible answers - from what I understand).
But... Isn't 128 bits the length of the hash - not the length of the original input? Wouldn't you have to try all possible cases of input data? And wouldn't that be a totally different number of possibilities unless the input data were also exactly 128 1s or 0s?

Calculate pseudo random number based on an increasing value

I need to calculate a pseudo random number in a given range (e.g. 0-150) based on another, strictly increasing number. Is there a mathematical way to solve this?
I am given one number x, which increases by 1 every day. Based on this number, I need to - somehow - calculate a number in a given range, which seems to be random.
I have a feeling that there is an easy mathematical solution for this, but sadly I am not able to find it. So any help would be appreciated. Thanks!
One sound way to do that is to hash the number x (either its binary representation or in text form) and then to use the hash to produce the 'random' number in the desired range (say by taking the first 32 bits of the hash and extracting by any known method the desired value). A cryptographic hash can be used like Sha256, but this is not necessary, MurmurHash is possibly a good one for your application.
Normally when you generate a random number, a seed value is used so that the same sequence of psuedorandom numbers isn't repeated. When a seed isn't explicitly given, many systems will use the time as a seed value.
Perhaps you could use x as a seed.
Here's an article explaining seeding: https://www.statisticshowto.com/random-seed-definition/

Block size for rsa algorithm?

I read book about cryptography but I don't understand this statement:
In RSA the block size must be less than or equal to log2(n).
somebody help me
The RSA algorithm involves performing a modulo by N (the modulus) operation to recover the plain-text value. As a result, the plain-text must be less than the N, since the modulo operation results in a value between 0 and N-1.
To describe this restriction in terms of a "block size", we need to know the number of bits in N, which is simply log2(N).

Design for max hash size given N-digit numerical input and collision related target

Assume a hacker obtains a data set of stored hashes, salts, pepper, and algorithm and has access to unlimited computing resources. I wish to determine a max hash size so that the certainty of determining the original input string is nominally equal to some target certainty percentage.
Constraints:
The input string is limited to exactly 8 numeric characters
uniformly distributed. There is no inter-digit relation such as a
checksum digit.
The target nominal certainty percentage is 1%.
Assume the hashing function is uniform.
What is the maximum hash size in bytes so there are nominally 100 (i.e. 1% certainty) 8-digit values that will compute to the same hash? It should be possible to generalize to N numerical digits and X% from the accepted answer.
Please include whether there are any issues with using the first N bytes of the standard 20 byte SHA1 as an acceptable implementation.
It is recognized that this approach will greatly increase susceptibility to a brute force attack by increasing the possible "correct" answers so there is a design trade off and some additional measures may be required (time delays, multiple validation stages, etc).
It appears you want to ensure collisions, with the idea that if a hacker obtained everything, such that it's assumed they can brute force all the hashed values, then they will not end up with the original values, but only a set of possible original values for each hashed value.
You could achieve this by executing a precursor step before your normal cryptographic hashing. This precursor step simply folds your set of possible values to a smaller set of possible values. This can be accomplished by a variety of means. Basically, you are applying an initial hash function over your input values. Using modulo arithmetic as described below is a simple variety of hash function. But other types of hash functions could be used.
If you have 8 digit original strings, there are 100,000,000 possible values: 00000000 - 99999999. To ensure that 100 original values hash to the same thing, you just need to map them to a space of 1,000,000 values. The simplest way to do that would be convert your strings to integers, perform a modulo 1,000,000 operation and convert back to a string. Having done that the following values would hash to the same bucket:
00000000, 01000000, 02000000, ....
The problem with that is that the hacker would not only know what 100 values a hashed value could be, but they would know with surety what 6 of the 8 digits are. If the real life variability of digits in the actual values being hashed is not uniform over all positions, then the hacker could use that to get around what you're trying to do.
Because of that, it would be better to choose your modulo value such that the full range of digits are represented fairly evenly for every character position within the set of values that map to the same hashed value.
If different regions of the original string have more variability than other regions, then you would want to adjust for that, since the static regions are easier to just guess anyway. The part the hacker would want is the highly variable part they can't guess. By breaking the 8 digits into regions, you can perform this pre-hash separately on each region, with your modulo values chosen to vary the degree of collisions per region.
As an example you could break the 8 digits thus 000-000-00. The prehash would convert each region into a separate value, perform a modulo, on each, concatenate them back into an 8 digit string, and then do the normal hashing on that. In this example, given the input of "12345678", you would do 123 % 139, 456 % 149, and 78 % 47 which produces 123 009 31. There are 139*149*47 = 973,417 possible results from this pre-hash. So, there will be roughly 103 original values that will map to each output value. To give an idea of how this ends up working, the following 3 digit original values in the first region would map to the same value of 000: 000, 139, 278, 417, 556, 695, 834, 973. I made this up on the fly as an example, so I'm not specifically recommending these choices of regions and modulo values.
If the hacker got everything, including source code, and brute forced all, he would end up with the values produced by the pre-hash. So for any particular hashed value, he would know that that it is one of around 100 possible values. He would know all those possible values, but he wouldn't know which of those was THE original value that produced the hashed value.
You should think hard before going this route. I'm wary of anything that departs from standard, accepted cryptographic recommendations.

Is a hash result ever the same as the source value?

This is more of a cryptography theory question, but is it possible that the result of a hash algorithm will ever be the same value as the source? For example, say I have a string:
baf34551fecb48acc3da868eb85e1b6dac9de356
If I get the SHA1 hash on it, the result is:
4d2f72adbafddfe49a726990a1bcb8d34d3da162
In theory, is there ever a case where these two values would match? I'm not asking about SHA1 specifically here - it's just my example. I'm just wondering if hashing algorithms are built in such a way as to prevent this.
Well, it would depend on the hashing algorithm - but I'd be surprised to see anything explicitly prevent this. After all, it really shouldn't matter.
I suspect it's very unlikely to happen, of course (for cryptographic hashes)... but even if it does, that shouldn't cause a problem.
For non-crypto hashes (used in hash tables etc) it would be perfectly reasonable to return the source value in some cases. For example, in Java, Integer.hashCode() just returns the embedded value.
Sure, the Python hashing algorithm for integers returns the value of the integer. So hash(1) == 1.
Given a good hashing algorithm, one that returns a seemingly random output, I believe there should be on average one input that gives itself as the output. Let's say the hash can give N possible outputs. That means there are N possible inputs for which this is possible. For each of those, the odds of the output matching the input is 1/N, so there the expected number of fixed points is N*1/N, or 1.
A hash function might be defined to avoid ‘fixed points’ where hash(x)==x, but your hash-quine differs a little in that you're taking the string representation in hex of the hash rather than the raw binary. It would, I think, be infeasible to design a hash that could frustrate that, and it's mathematically less interesting since it depends on the arbitrary mapping of 0-F to ASCII character codes.
See Is there an MD5 Fixed Point where md5(x) == x? for a discussion about fixed points in MD5. The probability calculation would be equally true for hex hash-quines and any other hash function with 128 bits of output.