Why is a JWT split into three dot-delimited parts? - jwt

A JSON Web Token (JWT) is split into three Base-64-encoded parts, which are concatenated by periods ("."). The first two parts encode JSON objects, the first of which is a header detailing the signature and hashing algorithm, and the second contains the assertions. The third is binary data that is the signature itself.
My question is: why is the JSON Web Token split into three separate parts like this? It seems like it would have made parsing them a lot easier to have encoded them as a single JSON object, like so (the example below is incomplete for brevity's sake):
{
"header": {
"alg": "rsa"
},
"assertions": {
"iss": "2019-10-09T12:34:56Z"
},
"sig": "qoewrhgoqiethgio3n5h325ijh3=="
}
Stated differently: why didn't the designers of JWT just put all parts of the JWT in a single JSON object like shown above?

IMHO, it would bring cause more issues. Yes you could parse it nicely, but what about verification of signature?
The Structure of a JWT is <B64 String>.<B64 String>.<B64 String>. Signature is basically the 2 first parts signed. It is unlikely that the structure will be modified by various frameworks in any way.
Now consider JSON: during serialisation and deserialisation the order of elements may chang. Object {"a":1,"b":2} and {"b":2,"a":1} might be equal in javascript but if you stringify them, they will generate different signatures.
Also, to check the signature you would need to decide upon a standard form of JSON that will be used to generate signature (for instance, beautified or minified). Again, different choices will generate different signatures.
As a result there are more hassles than benefits from simply using JSON

While I'm not speaking for the people who designed the JWT, I can think of one major reason why your suggestion won't fly:
Headers don't allow newlines
Remember that a primary use-case for a JWT is to use it as a cookie value. Cookies get passed up in headers. Header values don't support newlines: each header key/value pair needs to fit on one line.
Therefore, arbitrary JSON will not work for something that is meant to be passed as a header value in an HTTP request.
Therefore, some sort of encoding is required - which is why base64 is used in the first place. The reason base64 often shows up is because it converts any blob or string into something that can be reliably transported as simple ascii in almost any circumstances. I.e. three base64 encoded "payloads" separated with periods (which isn't a valid character in base64 encoding) is pretty much guaranteed to transport safely and without mangling between just about any system.
JSON cannot make the same guarantees. Of course, you could remove the newlines (JSON ignores whitespace anyway), but quotes are still a problem: they should be encoded in URLs, and maybe not in other circumstances, although they would probably be okay in an HTTP headers. As a result it just becomes one more "gotcha" as a developer tries to implement it for the first time.
I'm sure there are other reasons too: this isn't meant to be a comprehensive list.

The signature can not be a part of what is signed, therefore it has to be separate.
The header and payloads could be combined into on JSON object, but it would be bad design. It is the job of your JWT library to check the headers and verify the signature. That can (and should) be done without concern for the payload. It is the job for your application to react to the payload. As long as the signature checks out, that can be done without concern for the headers.
Separate conserns, separate objects.

Related

What is the reason for encoding JWT?

To the best of my knowledge regarding the security of JWT, if the JWT token doesn’t contain sensitive information, we only use its signature feature so that its content cannot be manipulated. Otherwise, if it contains sensitive data it can be encrypted to protect the data from sniffing. Also, both of them can be employed if needed.
However, what I cannot understand is that why the token is not a plain json? Why is it encoded while it can be easily decoded? Does it have a security reason or there is another reason behind that?
I searched the net and also took a quick look at RFC 7519 but I couldn’t find any clear and convincing answers.
Mostly to ease processing of JWTs.
A JWT is represented as a sequence of URL-safe parts separated by period ('.') characters. Each part contains a base64url-encoded value.
This ensures both that a) the entire token is URL-safe, which simplifies things for a technology that's mostly used in a web context; and b) makes it easy to process "parts", since the part-separator ('.') is guaranteed to not occur inside the parts themselves. If it was plain JSON, a period may be anywhere within the encoded value itself, and you'd need to apply more complex JSON-aware parsing to find separate parts. But given the guarantee that a part cannot contain periods due to being base64url-encoded, the parsing algorithm is simple:
Verify that the JWT contains at least one period ('.') character.
Let the Encoded JOSE Header be the portion of the JWT before the first period ('.') character.
Base64url decode the Encoded JOSE Header following the restriction that no line breaks, whitespace, or other additional characters have been used.
...
(All excerpts from the RFC.)

How to Fingerprint a JWK?

Is there a standard, canonical method for creating a fingerprint (aka thumbprint) for a JWK?
From what I was reading it seems that the standard doesn't define how a kid should be specified, which I find odd. To me it makes the most since to have it be a deterministic value rather than one that requires a lookup table such that others could easily recreate the key id in by virtue of possessing the public key.
I am aware that SSH fingerprints and X.509 thumbprints are standardized, but those don't seem like a suitable solution for all environments where JWKs are used (especially browsers) because they are too complex for naive implementations and including the libraries capable of manipulating such (i.e. forge) would waste a lot of memory, bandwidth, and vm compile time.
Update
Officially it's called a "thumbprint" not a "fingerprint".
I think the RFC7638 will answer your question.
This RFC describes a way to compute a hash value over a JWK.
It is really easy to implement:
Keep the required parameters only. For a RSA key: kty, n and e and for an EC key: crv, kty, x and y.
Sort those parameters in lexicographic order: e,kty and n
Compute the parameters and values into Json: {"e":"AQAB","kty":"RSA","n":"0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2 aiAFbWhM78LhWx4cbbfAAtVT86zwu1RK7aPFFxuhDR1L6tSoc_BJECPebWKRXjBZCi FV4n3oknjhMstn64tZ_2W-5JsGY4Hc5n9yBXArwl93lqt7_RN5w6Cf0h4QyQ5v-65Y GjQR0_FDW2QvzqY368QQMicAtaSqzs8KJZgnYb9c7d0zgdAZHzu6qMQvRL5hajrn1n 91CbOpbISD08qNLyrdkt-bFTWhAI4vMQFh6WeZu0fM4lFd2NcRwr3XPksINHaQ-G_x BniIqbw0Ls1jF44-csFCur-kEgU8awapJzKnqDKgw"}
Hash it using SHA-256 and encode it into Base64 Url Safe: NzbLsXh8uDCcd-6MNwXF4W_7noWXFZAfHkxZsRGC9Xs
I don't believe there is a true standard, but this topic has been discussed in the IETF mailing archives. While the conversation seemed to get a little side-tracked by whether or not canonical JSON was a good idea in general, there was one method that seems reasonable as a standard fingerprinting method.
Remove all "metadata" fields from the JWK (where in this case "metadata" is defined as any non-required key, ie anything but "kty" and the parameters for the encryption algorithm defined by the JWA RFC-7518).
Convert stripped JWK into "canonical" JSON (sort keys lexicographically, no leading or trailing whitespace, and no whitespace between tokens).
Compute digest over created JSON string.
There is also no true standard that I am aware of for canonical JSON, but all the sources I've seen agree on at least the rules listed above (which are the only rules that should be relevant for the types of objects used for JWK's).

Is there a standard on how to sign primitive types?

I am designing a protocol to exchange IOUs (digital promissory notes).
These should be digitally signed, but the signature should be independent from the data representation (whether its XML, JSON, binary, little or big endian numbers).
Is there any standard on how to sign a list of strings and primitive types (like integers, floating points, booleans)?
There isn't one standard encoding, but you can specify canonical forms for particular encodings.
For json you could specify that there is no whitespace outside strings and that keys should be sorted in a particular way.
For ASN.1 there is DER encoding, which is the canonical form of BER.
There is Cryptographic Message Syntax (CMS), but I don't know much about it.
The better question is what is the best format for verifying digitally signed Data primitives.
The answer is xml formatted and signed according to the XAdES standard. XAdES is harmonized with the related standards and many implementations participate in interoperability tests hosted by etsi.
Unless it is easy to verify a digitally signed format, the signature has limited value.
You can sign any bit stream and store/maintain the signature as a detached signature. But then you and the relying parties (the recipients) need to deal with two files. One for the data and one for the signature.
The advantage of xml with XAdES is that the format enables the signed xml file to include the digital signature.
You can create an equivalent of XAdES for another data format such as json. But a new format has limited use unless it becomes popular and standardized. XAdES has already accomplished this, so it is the way to go.
Added
Re: comment--
I want to provide non-repudiation. I understand that I have to save the information I signed. But I was hoping that I don't have to save it as XML but could rather save all values included in the signature in a database (less verbosely) and uniquely reconstruct the signed string from them before verifying.
Technically, you can do that. You'll need to watch out for spacing issues within the xml. But practically, not a good idea. Why:
Proving non-repudiation requires that you meet the applicable burden of proof that the alleged signer really did sign the data.
You may be trying to convince the original signer of this, an expert third party (an auditor) or non-experts (lawyers and juries). You want to make it easy and simple to convince these people. Schemes such as "re-creating" the signed file are not simple to understand compared with "here is the original signed file. Its signature verifies and it was signed with the digital certificate belonging to Susan Signer."
To keep it simple, I'd suggest signing an XAdES XML file. Then extract the data from the file and use it in your dbms. Hang on to the original signed file in your dbms or elsewhere. In case of a dispute, produce the original file and show that it verifies. A second part of the audit would be to show that your dbms has the same data values as the signed XML.
The programming and storage costs of hanging on to the original, signed, xml file are de minimis, when compared with your goal of proving non-repudiation of the data.
By the way, how is the signer's certificate managed? If it is anything less than a QSCD (Qualified Signature Creation Device), such as storing the cert in the file system, then you have another problem: no way to conclusively prove that the certificate wasn't used by an imposter. Use a secure system for signing such as CoSign (my company) or an equivalent system.

RESTful design for handling requests differentiated as either JSON or multpart/form-data

I am designing a REST API for text analysis, and I want to accept submitted content as either JSON or in file format (which is more convenient for large submissions). The JSON format has a "text" field, from which the text is extracted. Uploaded files just contain raw textual content. So I will be handling these two types of content as either a JSON-encoded body, or in the latter case as a multipart/form-data upload. My question is really related to how I should design my API with respect to handling these two different types of submissions, whilst remaining as RESTful as possible.
My initial thought was to have two different endpoints, /json for JSON, and /files for uploaded files. It doesn't seem right, however, to have two endpoints differentiated only by the type of content submitted by clients, rather than functionality. As an alternative I then considered differentiating according to request method, using PUT for JSON and POST for files. Again this seems wrong, since it is add odds with the semantics of the request methods themselves.
It seems the only alternative is to accept the two types of encodings via the same endpoint. I'm still not sure if this is the right way to proceed from a design perspective, however, hence my question. I guess this is precisely what the Content-Type header is for, as stated here. But there seems a more radical distinction between JSON and multipart/form-data than between JSON and XML.
When doing REST design my inclination is to stick with:
Unified methods (PUT is PUT regardless of Content-Type, POST is always POST)
I think that Content-Type is ultimately the correct differentiator, but if you'd prefer to embed it in the URL, I'd go with a content suffix of some sort.
.json for application/JSON and .file for form-data?
Once you move afield of Content-Type as the differentiator, it's sort wibbly wobbly freeformy at that point.

Approaches to programming application-level protocols?

I'm doing some simple socket programming in C#. I am attempting to authenticate a user by reading the username and password from the client console, sending the credentials to the server, and returning the authentication status from the server. Basic stuff. My question is, how do I ensure that the data is in a format that both the server and client expect?
For example, here's how I read the user credentials on the client:
Console.WriteLine("Enter username: ");
string username = Console.ReadLine();
Console.WriteLine("Enter plassword: ");
string password = Console.ReadLine();
StreamWriter clientSocketWriter = new StreamWriter(new NetworkStream(clientSocket));
clientSocketWriter.WriteLine(username + ":" + password);
clientSocketWriter.Flush();
Here I am delimiting the username and password with a colon (or some other symbol) on the client side. On the server I simply split the string using ":" as the token. This works, but it seems sort of... unsafe. Shouldn't there be some sort of delimiter token that is shared between client and server so I don't have to just hard-code it in like this?
It's a similar matter for the server response. If the authentication is successful, how do I send a response back in a format that the client expects? Would I simply send a "SUCCESS" or "AuthSuccessful=True/False" string? How would I ensure the client knows what format the server sends data in (other than just hard-coding it into the client)?
I guess what I am asking is how to design and implement an application-level protocol. I realize it is sort of unique to your application, but what is the typical approach that programmers generally use? Furthermore, how do you keep the format consistent? I would really appreciate some links to articles on this matter as well.
Rather than reinvent the wheel. Why not code up an XML schema and send and receive XML "files".
Your messages will certainly be longer, but with gigabyte Ethernet and ADSL this hardly matters these days. What you do get is a protocol where all the issues of character sets, complex data structures have already been solved, plus, an embarrassing choice of tools and libraries to support and ease your development.
I highly recommend using plain ASCII text if at all possible.
It makes bugs much easier to detect and fix.
Some common, machine-readable ASCII text protocols (roughly in order of complexity):
netstring
Tab Delimited Tables
Comma Separated Values (CSV) (strings that include both commas and double-quotes are a little awkward to handle correctly)
INI file format
property list format
JSON
YAML Ain't Markup Language
XML
The world is already complicated enough, so I try to use the least-complex protocol that would work.
Sending two user-generated strings from one machine to another -- netstrings is the simplest protocol on my list that would work for that, so I would pick netstrings.
(netstrings will will work fine even if the user types in a few colons or semi-colons or double-quotes or tabs -- unlike other formats that choke on certain commonly-typed characters).
I agree that it would be nice if there existed some way to describe a protocol in a single shared file such that that both the server and the client could somehow "#include" or otherwise use that protocol.
Then when I fix a bug in the protocol, I could fix it in one place, recompile both the server and the client, and then things would Just Work -- rather than digging through a bunch of hard-wired constants on both sides.
Kind of like the way well-written C code and C++ code uses function prototypes in header files so that the code that calls the function on one side, and the function itself on the other side, can pass parameters in a way that both sides expect.
Tell me if you discover anything like that, OK?
Basically, you're looking for a standard. "The great thing about standards is that there are so many to choose from". Pick one and go with it, it's a lot easier than rolling your own. For this particular situation, look into Apache "basic" authentication, which joins the username and password and base64-encodes it, as one possibility.
I have worked with two main approaches.
First is ascii based protocol.
Ascii based protocol is usally based on a set of text commands that terminate on some defined delimiter (like a carriage return or semicolon or xml or json). If your protocol is a command based protocol where there is not a lot of data being transferred back and forth then this is the best way to go.
FIND\r
DO_SOMETHING\r
It has the advantage of being easy to read and understand because it is text based.
The disadvantage (may not be a problem but can be) is that there can be an unknown number of bytes being transferred back and forth from the client and the server. So if you need to know exactly how many bytes are being sent and received this may not be the type of protocol you want.
The other type of protocol is binary based with fixed sized messages that are sent in the header. This has the advantage of knowing exactly how much data the client is expected to receive. It also can potentially save you bandwith depending on what your sending across. Although, ascii can also save you space too, it depends on your application requirements. The disadvantage of a binary based protocol is that it is difficult to understand by just looking at it....requiring you to constantly look at documentation.
In practice, I tend to mix both strategies in protocols I have defined based on my application's requirements.