Is there a standard, canonical method for creating a fingerprint (aka thumbprint) for a JWK?
From what I was reading it seems that the standard doesn't define how a kid should be specified, which I find odd. To me it makes the most since to have it be a deterministic value rather than one that requires a lookup table such that others could easily recreate the key id in by virtue of possessing the public key.
I am aware that SSH fingerprints and X.509 thumbprints are standardized, but those don't seem like a suitable solution for all environments where JWKs are used (especially browsers) because they are too complex for naive implementations and including the libraries capable of manipulating such (i.e. forge) would waste a lot of memory, bandwidth, and vm compile time.
Update
Officially it's called a "thumbprint" not a "fingerprint".
I think the RFC7638 will answer your question.
This RFC describes a way to compute a hash value over a JWK.
It is really easy to implement:
Keep the required parameters only. For a RSA key: kty, n and e and for an EC key: crv, kty, x and y.
Sort those parameters in lexicographic order: e,kty and n
Compute the parameters and values into Json: {"e":"AQAB","kty":"RSA","n":"0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2 aiAFbWhM78LhWx4cbbfAAtVT86zwu1RK7aPFFxuhDR1L6tSoc_BJECPebWKRXjBZCi FV4n3oknjhMstn64tZ_2W-5JsGY4Hc5n9yBXArwl93lqt7_RN5w6Cf0h4QyQ5v-65Y GjQR0_FDW2QvzqY368QQMicAtaSqzs8KJZgnYb9c7d0zgdAZHzu6qMQvRL5hajrn1n 91CbOpbISD08qNLyrdkt-bFTWhAI4vMQFh6WeZu0fM4lFd2NcRwr3XPksINHaQ-G_x BniIqbw0Ls1jF44-csFCur-kEgU8awapJzKnqDKgw"}
Hash it using SHA-256 and encode it into Base64 Url Safe: NzbLsXh8uDCcd-6MNwXF4W_7noWXFZAfHkxZsRGC9Xs
I don't believe there is a true standard, but this topic has been discussed in the IETF mailing archives. While the conversation seemed to get a little side-tracked by whether or not canonical JSON was a good idea in general, there was one method that seems reasonable as a standard fingerprinting method.
Remove all "metadata" fields from the JWK (where in this case "metadata" is defined as any non-required key, ie anything but "kty" and the parameters for the encryption algorithm defined by the JWA RFC-7518).
Convert stripped JWK into "canonical" JSON (sort keys lexicographically, no leading or trailing whitespace, and no whitespace between tokens).
Compute digest over created JSON string.
There is also no true standard that I am aware of for canonical JSON, but all the sources I've seen agree on at least the rules listed above (which are the only rules that should be relevant for the types of objects used for JWK's).
Related
A JSON Web Token (JWT) is split into three Base-64-encoded parts, which are concatenated by periods ("."). The first two parts encode JSON objects, the first of which is a header detailing the signature and hashing algorithm, and the second contains the assertions. The third is binary data that is the signature itself.
My question is: why is the JSON Web Token split into three separate parts like this? It seems like it would have made parsing them a lot easier to have encoded them as a single JSON object, like so (the example below is incomplete for brevity's sake):
{
"header": {
"alg": "rsa"
},
"assertions": {
"iss": "2019-10-09T12:34:56Z"
},
"sig": "qoewrhgoqiethgio3n5h325ijh3=="
}
Stated differently: why didn't the designers of JWT just put all parts of the JWT in a single JSON object like shown above?
IMHO, it would bring cause more issues. Yes you could parse it nicely, but what about verification of signature?
The Structure of a JWT is <B64 String>.<B64 String>.<B64 String>. Signature is basically the 2 first parts signed. It is unlikely that the structure will be modified by various frameworks in any way.
Now consider JSON: during serialisation and deserialisation the order of elements may chang. Object {"a":1,"b":2} and {"b":2,"a":1} might be equal in javascript but if you stringify them, they will generate different signatures.
Also, to check the signature you would need to decide upon a standard form of JSON that will be used to generate signature (for instance, beautified or minified). Again, different choices will generate different signatures.
As a result there are more hassles than benefits from simply using JSON
While I'm not speaking for the people who designed the JWT, I can think of one major reason why your suggestion won't fly:
Headers don't allow newlines
Remember that a primary use-case for a JWT is to use it as a cookie value. Cookies get passed up in headers. Header values don't support newlines: each header key/value pair needs to fit on one line.
Therefore, arbitrary JSON will not work for something that is meant to be passed as a header value in an HTTP request.
Therefore, some sort of encoding is required - which is why base64 is used in the first place. The reason base64 often shows up is because it converts any blob or string into something that can be reliably transported as simple ascii in almost any circumstances. I.e. three base64 encoded "payloads" separated with periods (which isn't a valid character in base64 encoding) is pretty much guaranteed to transport safely and without mangling between just about any system.
JSON cannot make the same guarantees. Of course, you could remove the newlines (JSON ignores whitespace anyway), but quotes are still a problem: they should be encoded in URLs, and maybe not in other circumstances, although they would probably be okay in an HTTP headers. As a result it just becomes one more "gotcha" as a developer tries to implement it for the first time.
I'm sure there are other reasons too: this isn't meant to be a comprehensive list.
The signature can not be a part of what is signed, therefore it has to be separate.
The header and payloads could be combined into on JSON object, but it would be bad design. It is the job of your JWT library to check the headers and verify the signature. That can (and should) be done without concern for the payload. It is the job for your application to react to the payload. As long as the signature checks out, that can be done without concern for the headers.
Separate conserns, separate objects.
To the best of my knowledge regarding the security of JWT, if the JWT token doesn’t contain sensitive information, we only use its signature feature so that its content cannot be manipulated. Otherwise, if it contains sensitive data it can be encrypted to protect the data from sniffing. Also, both of them can be employed if needed.
However, what I cannot understand is that why the token is not a plain json? Why is it encoded while it can be easily decoded? Does it have a security reason or there is another reason behind that?
I searched the net and also took a quick look at RFC 7519 but I couldn’t find any clear and convincing answers.
Mostly to ease processing of JWTs.
A JWT is represented as a sequence of URL-safe parts separated by period ('.') characters. Each part contains a base64url-encoded value.
This ensures both that a) the entire token is URL-safe, which simplifies things for a technology that's mostly used in a web context; and b) makes it easy to process "parts", since the part-separator ('.') is guaranteed to not occur inside the parts themselves. If it was plain JSON, a period may be anywhere within the encoded value itself, and you'd need to apply more complex JSON-aware parsing to find separate parts. But given the guarantee that a part cannot contain periods due to being base64url-encoded, the parsing algorithm is simple:
Verify that the JWT contains at least one period ('.') character.
Let the Encoded JOSE Header be the portion of the JWT before the first period ('.') character.
Base64url decode the Encoded JOSE Header following the restriction that no line breaks, whitespace, or other additional characters have been used.
...
(All excerpts from the RFC.)
I am designing a protocol to exchange IOUs (digital promissory notes).
These should be digitally signed, but the signature should be independent from the data representation (whether its XML, JSON, binary, little or big endian numbers).
Is there any standard on how to sign a list of strings and primitive types (like integers, floating points, booleans)?
There isn't one standard encoding, but you can specify canonical forms for particular encodings.
For json you could specify that there is no whitespace outside strings and that keys should be sorted in a particular way.
For ASN.1 there is DER encoding, which is the canonical form of BER.
There is Cryptographic Message Syntax (CMS), but I don't know much about it.
The better question is what is the best format for verifying digitally signed Data primitives.
The answer is xml formatted and signed according to the XAdES standard. XAdES is harmonized with the related standards and many implementations participate in interoperability tests hosted by etsi.
Unless it is easy to verify a digitally signed format, the signature has limited value.
You can sign any bit stream and store/maintain the signature as a detached signature. But then you and the relying parties (the recipients) need to deal with two files. One for the data and one for the signature.
The advantage of xml with XAdES is that the format enables the signed xml file to include the digital signature.
You can create an equivalent of XAdES for another data format such as json. But a new format has limited use unless it becomes popular and standardized. XAdES has already accomplished this, so it is the way to go.
Added
Re: comment--
I want to provide non-repudiation. I understand that I have to save the information I signed. But I was hoping that I don't have to save it as XML but could rather save all values included in the signature in a database (less verbosely) and uniquely reconstruct the signed string from them before verifying.
Technically, you can do that. You'll need to watch out for spacing issues within the xml. But practically, not a good idea. Why:
Proving non-repudiation requires that you meet the applicable burden of proof that the alleged signer really did sign the data.
You may be trying to convince the original signer of this, an expert third party (an auditor) or non-experts (lawyers and juries). You want to make it easy and simple to convince these people. Schemes such as "re-creating" the signed file are not simple to understand compared with "here is the original signed file. Its signature verifies and it was signed with the digital certificate belonging to Susan Signer."
To keep it simple, I'd suggest signing an XAdES XML file. Then extract the data from the file and use it in your dbms. Hang on to the original signed file in your dbms or elsewhere. In case of a dispute, produce the original file and show that it verifies. A second part of the audit would be to show that your dbms has the same data values as the signed XML.
The programming and storage costs of hanging on to the original, signed, xml file are de minimis, when compared with your goal of proving non-repudiation of the data.
By the way, how is the signer's certificate managed? If it is anything less than a QSCD (Qualified Signature Creation Device), such as storing the cert in the file system, then you have another problem: no way to conclusively prove that the certificate wasn't used by an imposter. Use a secure system for signing such as CoSign (my company) or an equivalent system.
Since SHA-3 seems to be an already known function (Keccak as the finalist of NIST hash function competition) I have several questions related to this topic:
NIST site says that NIST is closed due to a lapse in government funding. Is there any chance that SHA-3 will ever be finally accepted?
BouncyCastle library has an implementation of SHA-3 which digest results are the same as examples posted in wikipedia article (I tested this). Since the final standard is not approved, can this be trusted? Wikipedia says this is likely to be changed but how can it change as the final algorithm does not seem to be a subject to change (or else it would be another algorithm).
Here someone noted that usage of PBKDF2 with SHA-3 for key strengthening and password hashing should be avoided. But I cannot understand why? (how can it give attacker an advantage if the algorithm is not fast?)
I could not find test vectors anywhere to test my implementation of PBKDF2-HMAC-SHA3 in scala based on BouncyCastle java api. I can post my test spec with some results. But first can anybody post any/spec test vectors?
Here is my implementation in scala:
package my.crypto
import org.bouncycastle.crypto.digests.SHA3Digest
import org.bouncycastle.crypto.generators.PKCS5S2ParametersGenerator
import org.bouncycastle.crypto.PBEParametersGenerator
import org.bouncycastle.crypto.params.KeyParameter
object PBKDF2WithHmacSHA3 {
def apply(password: String, salt: Array[Byte], iterations: Int = 65536, keyLen: Int = 256): Array[Byte] = {
val generator = new PKCS5S2ParametersGenerator(new SHA3Digest(256))
generator.init(
PBEParametersGenerator.PKCS5PasswordToUTF8Bytes(password.toCharArray),
salt,
iterations
)
val key = generator.generateDerivedMacParameters(keyLen).asInstanceOf[KeyParameter]
key.getKey
}
}
One questionable thing for me is new SHA3Digest(256), the 256 bit length in the constructor, should it be same as provided key length or some fixed one as I did? I decided to use a fixed length because only some fixed values can be used and object API user can provide any value as key length parameter, but most of uncommon ones would result in exception thrown from inside SHA3Digest constructor. Also the default value seem to be 288 (when no key length is provided) which looks strange.
Thanks in advance!
Shutdown is temporary. SHA-3 will most likely be standardized at some point in 2014.
No, those values are probably for Final Round Keccak, not for SHA-3. There is no SHA-3 spec yet and it's quite likely that SHA-3 will be tweaked before standardization.
=> it's impossible to implement SHA-3 now, you can only implement Keccak.
Password hashes should be as expensive as possible for the attacker. The attacker uses different hardware from the defender, at minimum a GPU, but possible even custom chips.
The defender has a limited time budged for a hash (e.g. 100ms) and wants a function that's as expensive as possible for the attacker given that constraint. This means that custom hardware shouldn't gain a big advantage over a standard computer. So it's preferable to use a software friendly hash, but Keccak is relatively hardware friendly.
SHA-1 and SHA-2 are decent in hardware as well, so in practice the difference is small compared to the advantage other password hashes have over PBKDF2-HMAC-SHA-x. If you care about security instead of standard conformance, I recommend scrypt.
Is it 'safe' to store cipher parameters in the (unencrypted) header of an encrypted file? Is there anything (other than the key of course!) that shouldn't be stored/transmitted in the clear?
You are using a symmetric encryption, where storing the blocksize, blockmode and keysize would be safe, since you don't (mustn't) make keys available as you stated.
But all such params are in general useful to attackers. If the file cannot easily be associated with a cipher and used params (or the software respectively), an attacker would have considerably more work to do and that's what encryption basically is for. A cipher is secure, while (and because) everyone can see how it works. Additionally trying to hide some information can also add some security.
AES has a fixed block size of 128bits, which itself is not a critical information, knowing of AES itself already. So this one is not needed inside the file header.
The keysize is given by the key itself, so it can be left out too.
The blockmode is the remaining parameter. Just never use ECB. Permanently use a single blockmode like OCB and you don't need to store it in the file aswell.
Predefining all params at both sides is a solution, if you don't intend to change them per file.
Error checking can be done using checksums, which are also critical information, so you may encrypt them together with the data or provide them together with the key.
Perhaps, following approaches can help if you have to transmit the params anyway:
Transmit params in the key file, if you're up to define the format yourself and the keys were distributed on a per file basis.
You could also define different settings by mapping them to some randomly defined enumerators, which don't provide valuable information without knowing the software.