Minimum and maximum length of X509 serialNumber - x509

The CA/Browser Forum Baseline Requirements section 7.1 states the following:
CAs SHOULD generate non‐sequential Certificate serial numbers that exhibit at least 20 bits of entropy.
At the mean time, RFC 5280 section 4.1.2.2 specifies:
Certificate users MUST be able to handle serialNumber values up to 20 octets. Conforming CAs MUST NOT use serialNumber values longer than 20 octets.
Which integer range can I use in order to fullfill both requirements. It is my understanding that the max. value will be 2^159 (730750818665451459101842416358141509827966271488). What is the min. value?

The CAB requirement has changed to minimum of 64 bit entropy. Since the leading bit in the representation for positive integers must be 0 there are a number of strategies:
produce random bit string with 64 bits. If leading bit is 1 then prepend 7 more random bits. This has the disadvantage that it is not obvious if the generation considered 63 or 64 bits
produce random bit string with 64 bits and add leading 0 byte. This is still compact (wastes only 7-8 bit) but is obvious that it has 64 bit entropy
produce a random string longer than 64 bit, for example 71 or 127 bit (+leading 0 bit). 16 byte seems to be a common length and well under the 20 byte limit.
BTW since it is unclear if the 20 byte maximum length includes a potential 0 prefix for interop reasons you should not generate more than 159 bit
A random integer with x bit entropy can be produced with generating a random number between 0 and (2^x)-1 (inclusive)

Related

SHA-256 does bit append still required if input is already N x 512 bit?

If my input is less than a multiple of 512 bit , i need to append bit and length bits to my input so that it is a multiple of 512 bit .
https://infosecwriteups.com/breaking-down-sha-256-algorithm-2ce61d86f7a3
But what if my input is already a multiple of 512 bit ? is it still required to do the bit append ? for example , if my message is already 512 bit long , do i need to bit append it to become 1024 bit long ?
And what if my input is less than a multiple of 512 bit , but long enough to not allow append length bits ? for example , my input is 504 bit long.
It seems to explain it rather clearly:
The number of bits we add is calculated as such so that after addition of these bits the length of the message should be exactly 64 bits less than a multiple of 512.
If there were 512 bits, you have to pad it to 960 bits, then add 64 bits for length for a total of 1024. The same with 504, since 504 > 512-64. Otherwise you'd have a situation where the last 64 bits are sometimes the length bits and sometimes not, which doesn't seem right.
The only case where you wouldn't add any padding is if the data is already 512*n-64, e.g. if it were 448 bits. Then you add no padding but still add the length bits, and end up with 512.
According to RFC 62634, you should always pad data with atleast one bit set to 1 and append length (64-bit).
Even if input is 448 bit long you should pad it resulting in two 512-bit chunks:
[448 bits of input]['1']['0' x 511][64 bits representing input length]
So this is wrong:
The only case where you wouldn't add any padding is if the data is already 512*n-64, e.g. if it were 448 bits. Then you add no padding but still add the length bits, and end up with 512.
Note: If you accept input only as byte array, minimum padding is 1 byte set to 10000000 (binary).

How to calculate the risk of conflicts of a 64 bit hash?

I need global unique ids for my application. I know there is an UUID standard for this, but I wonder if I really need 128 bits.
So I think about writing my own generator that uses system time, a random number, and the machines network address to generate an id that fits into 64 bit and therefore, can be stored in the unsigned long long int datatype in C++.
How can I determine if 64 bits is enough for me?
64 bit runs to about 18,446,744,073,709,551,616 combinations
which is around 18 and a half quintillion.
so if your'e generating 1.92 million hashes, the odds of a collision will be 1 in 10 million
Source: http://preshing.com/20110504/hash-collision-probabilities

pbkdf2 key length

What is the $key_length in PBKDF2
It says that it will be derived from the input, but I see people using key_lengths of 256 and greater, but when I enter 256 as a key_length the output is 512 characters. Is this intentional? Can I safely use 64 as the key_length so the output is 128 characters long?
$key_length is the number of output bytes that you desire from PBKDF2. (Note that if key_length is more than the number of output bytes of the hash algorithm, the process is repeated twice, slowing down that hashing perhaps more than you desire. SHA256 gives 32 bytes of output, for example, so asking for 33 bytes will take roughly twice as long as asking for 32.)
The doubling of the length that you mention is because the code converts the output bytes to hexadecimal (i.e. 2 characters per 1 byte) unless you specify $raw_output = true. The test vectors included specify $raw_output = false, since hexadecimal is simply easier to work with and post online. Depending on how you are storing the data in your application, you can decide if you want to store the results as hex, base64, or just raw binary data.
In the IETF specification of Password-Based Cryptography Specification Version 2.0 the key length is defined as
"intended length in octets of the derived key, a positive integer, at most
(2^32 - 1) * hLen" Here hLen denotes the length in octets of the pseudorandom function output. For further details on pbkdf2 you can refer How to store passwords securely with PBKDF2

Signed and unsigned integers -- why are bytes treated differently?

I am learning High Level Assembly Language at the moment, and was going over the concept of signed and unsigned integers. It seems simple enough, however getting to sign extension has confused me.
Take the byte 10011010 which I would take to be 154 in decimal. Indeed, using a binary calculator with anything more than word selected shows this as 154 in decimal.
However, if I select the unit to be a byte, and type in 10011010 then suddenly it is treated as -102 in decimal. Whenever I increase it starting from a byte then it is sign extended and will always be -102 in decimal.
If I use anything higher than a byte then it remains 154 in decimal.
Could somebody please explain this seeming disparity?
When you select the unit as a byte the MSB of 10011010 is treated as the signed bit, which makes this one byte signed integer equivalent interpretation to -102 (2's complement).
For integers sized large than 8 bits, say 16 bits the number will be: 0000000010011010 which do not have 1 in MSB therefore it is treated as a positive integer whose integer representation is 154 in decimal. When you convert the 8 bit byte to a higher type the sign extension will preserve the -ve interpretation in the larger length storage too.

Would it be possible to have a UTF-8-like encoding limited to 3 bytes per character?

UTF-8 requires 4 bytes to represent characters outside the BMP. That's not bad; it's no worse than UTF-16 or UTF-32. But it's not optimal (in terms of storage space).
There are 13 bytes (C0-C1 and F5-FF) that are never used. And multi-byte sequences that are not used such as the ones corresponding to "overlong" encodings. If these had been available to encode characters, then more of them could have been represented by 2-byte or 3-byte sequences (of course, at the expense of making the implementation more complex).
Would it be possible to represent all 1,114,112 Unicode code points by a UTF-8-like encoding with at most 3 bytes per character? If not, what is the maximum number of characters such an encoding could represent?
By "UTF-8-like", I mean, at minimum:
The bytes 0x00-0x7F are reserved for ASCII characters.
Byte-oriented find / index functions work correctly. You can't find a false positive by starting in the middle of a character like you can in Shift-JIS.
Update -- My first attempt to answer the question
Suppose you have a UTF-8-style classification of leading/trailing bytes. Let:
A = the number of single-byte characters
B = the number of values used for leading bytes of 2-byte characters
C = the number of values used for leading bytes of 3-byte characters
T = 256 - (A + B + C) = the number of values used for trailing bytes
Then the number of characters that can be supported is N = A + BT + CT².
Given A = 128, the optimum is at B = 0 and C = 43. This allows 310,803 characters, or about 28% of the Unicode code space.
Is there a different approach that could encode more characters?
It would take a little over 20 bits to record all the Unicode code points (assuming your number is correct), leaving over 3 bits out of 24 for encoding which byte is which. That should be adequate.
I fail to see what you would gain by this, compared to what you would lose by not going with an established standard.
Edit: Reading the spec again, you want the values 0x00 through 0x7f reserved for the first 128 code points. That means you only have 21 bits in 3 bytes to encode the remaining 1,113,984 code points. 21 bits is barely enough, but it doesn't really give you enough extra to do the encoding unambiguously. Or at least I haven't figured out a way, so I'm changing my answer.
As to your motivations, there's certainly nothing wrong with being curious and engaging in a little thought exercise. But the point of a thought exercise is to do it yourself, not try to get the entire internet to do it for you! At least be up front about it when asking your question.
I did the math, and it's not possible (if wanting to stay strictly "UTF-8-like").
To start off, the four-byte range of UTF-8 covers U+010000 to U+10FFFF, which is a huge slice of the available characters. This is what we're trying to replace using only 3 bytes.
By special-casing each of the 13 unused prefix bytes you mention, you could gain 65,536 characters each, which brings us to a total of 13 * 0x10000, or 0xD0000.
This would bring the total 3-byte character range to U+010000 to U+0DFFFF, which is almost all, but not quite enough.
Sure it's possible. Proof:
224 = 16,777,216
So there is enough of a bit-space for 1,114,112 characters but the more crowded the bit-space the more bits are used per character. The whole point of UTF-8 is that it makes the assumption that the lower code points are far more likely in a character stream so the entire thing will be quite efficient even though some characters may use 4 bytes.
Assume 0-127 remains one byte. That leaves 8.4M spaces for 1.1M characters. You can then solve this is an equation. Choose an encoding scheme where the first byte determines how many bytes are used. So there are 128 values. Each of these will represent either 256 characters (2 bytes total) or 65,536 characters (3 bytes total). So:
256x + 65536(128-x) = 1114112 - 128
Solving this you need 111 values of the first byte as 2 byte characters and the remaining 17 as 3 byte. To check:
128 + 111 * 256 + 17 * 65536 = 1,114,256
To put it another way:
128 code points require 1 byte;
28,416 code points require 2 bytes; and
1,114,112 code points require 3 bytes.
Of course, this doesn't allow for the inevitable expansion of Unicode, which UTF-8 does. You can adjust this to the first byte meaning:
0-127 (128) = 1 byte;
128-191 (64) = 2 bytes;
192-255 (64) = 3 bytes.
This would be better because it's simple bitwise AND tests to determine length and gives an address space of 4,210,816 code points.