What is the maximum size of JWT token? - jwt

I need to know the maximum length of
JSON Web Token (JWT)
In specs there are no information about it. Could be that, there are no limitations in length ?

I've also been trying to find this.
I'd say - try and ensure it's below 7kb.
Whilst JWT defines no upper limit in the spec (http://www.rfc-editor.org/rfc/rfc7519.txt) we do have some operational limits.
As a JWT is included in a HTTP header, we've an upper limit (SO: Maximum on http header values) of 8K on the majority of current servers.
As this includes all Request headers < 8kb, with 7kb giving a reasonable amount of room for other headers. The biggest risk to that limit would be cookies (sent in headers and can get large).
As it's encrypted and base64ed there's at least 33% wastage of the original json string, so do check the length of the final encrypted token.
One final point - proxies and other network appliances may apply an abitrary limit along the way...

As you said, there is no maximum length defined in the RFC7519 (https://www.rfc-editor.org/rfc/rfc7519) or other RFCs related to JWS or JWE.
If you use the JSON Serialized format or JSON Flattened Serialized format, there is no limitation and there is no reason to define a limitation.
But if you use the JSON Compact Serialized format (most common format), you have to keep in mind that it should be as short as possible because it is mainly used in a web context. A 4kb JWT is something that you should avoid.
Take care to store only useful claims and header informations.

When using heroku the header will be limited at 8k. Depending of how much data are you using on jwt2 it will be reach. The request, when oversize, will not touch your node instance, heroku router will drop it before your API layer..
When processing an incoming request, a router sets up an 8KB receive
buffer and begins reading the HTTP request line and request headers.
Each of these can be at most 8KB in length, but together can be more
than 8KB in total. Requests containing a request line or header line
longer than 8KB will be dropped by the router without being
dispatched.
See: Heroku Limits

Related

Do SAML 2.0 Attribute Statements have a size limit?

As the title says, I'm wondering if SAML 2.0 Attribute Statements have a definite size limit? Or does the limit vary from IDP to IDP? Thanks!
As long as the Response is being sent via a POST transaction, or being retrieved via Attribute Query, there's no limit on sizing of the Response and its various Attribute statements. With the Redirect binding (uncommon for Responses for security reasons), the use of large Attributes will be governed by any limits that browsers may impose on size of the supported URL (as an example, I believe IE11 still has an 8K character limit on the length of the URL).

Should this GET call return 204 or 200 with a body?

Say we have a rest api which aggregates data about a Person from other services. One of the Aggregator service routes is GET /person/(person id)/driverinfo which tells us whether the person is a licensed driver or not, license id, expiry date of license and the number of traffic violations. These data can be picked up by the Aggregator from one or more other services. This api will be used by a web page to show the "driver info" about a person. It will also be tested with automation.
Currently, the api gives 204 no content response for persons who never had a driving license. This is because one of the underlying apis gives a 204 for that scenario. So, it was decided that the Aggregator should do the same.
But, I believe that this is not a good response. Instead, we should return 200 with appropriate values for the fields. For example, licensed=false, licenseId = N.A. etc. when the underlying api gives a 204. I.e. the Aggregator should generate these fields and their values.
Which approach do you think is better and why ?
204 means something specific in HTTP; it says that the server found a representation of the requested resource, and that representation is zero bytes long.
Therefore, the real question is more like "Should we use a zero byte long message to describe a situation?". Maybe? If all of the fields in your message schema are optional, and we are trying to describe a representation that means that all of the fields are taking on their default values, then a zero byte array might be the right way to communicate that.
Within the context of HTTP specifically, the headers themselves are already significant in length (compared to zero), so I wouldn't expect there to be particularly compelling performance reasons to squeeze a signal down to zero length. For instance, if we were normally passing around application/json, I would expect that sending an empty object or array to be much more cost effective than sending nothing at all.

Why would ETags set to a MUST requirement if you already have the resource?

Why would you set ETags to a "MUST requirement level"?
You obtains the resource before the ETags returned...
I'm working on a project where I am the client that sends HTTP requests to a server that returns an HTTP Cache-Control header with ETags to cache response (where in each addition request it gets compared to the If-None-Match header to determine if the data is stale and if a new request should be made). In my current project the ETags parameter is using the conditional GET architecture with the MUST requirement level as specified in RFC 2119.
MUST This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.
I don't understand the intent of using a conditional GETwith the MUST requirement level? From my understanding the MUST requirement is there to limit (is that right?) the resources provided to the client that makes the request, however the client (me in this case) already has the resources from the first request. Where I can continue obtaining the same resource (or a fresher resource if it gets updated) as much as I want with or without returning the If-None-Match and ETag header fields.
What would be the purpose of setting it to the MUST requirement level in this case if it's not limiting the resources returned, Aside from being able to cache and limiting the amount of requests to the server (Im asking from the client point of view, yes I know I can cache it but why the MUST requirement)? Isn't this only used for limiting resources?
So basically, doesn't it make this MUST requirement not a requirement if I can obtain the resources with or without it? Am I missing something here?
My Question is not asking the what and how Etags, Cache-Control, or If-None-Match headers work.
Thanks in advance, cheers!
Why would ETags set to a MUST requirement if you already have the resource?
A client MUST use a conditional GET to reduce the data traffic.
Aside from being able to cache and limiting the amount of requests to the server
The number of requests stays the same, but the total number of data transferred changes.
Using ETags in if-none-matched GET requests (conditional GET)
When you make a API call, the response header includes an ETag with a value that is the hash of the data returned in the API call. You store this ETag value for use in the next request.
The next time you make the same API call, you include the If-None-Match request header with the ETag value stored from the first step.
If the data has not changed, the response status code will be 304 – Not Modified and no data is returned.
If the data has changed since the last query, the data is returned as usual with a new ETag. The game starts again: you store the new ETag value and use it for subsequent requests.
Why?
The main reason for using conditional GET requests is to reduce data traffic.
Isn't this only used for limiting resources?
No...
You can ask an API for multiple resources in one request.
(Ok, thats also limiting resources by saving the other requests.)
You can prevent a method (e.g. PUT) from modifying an existing resource, when the client believes that the resource does not exist (replace protection).
I can obtain the resources with or without it?
When you ignore the "MUST use conditional GET" then (a) the traffic will increase and (b) you lose the "resource has changed" indication coming from server-side. You would have to implement the comparison handling on client side: is the resource of the second request newer than the one from the first request.
I found my question wasn't asking the "right question" due to me merging my understand of other headers (thanks to #dcerecedo's comment to get my pointed in the right direction) that were affecting my understand of why MUST was being used.
The MUST was more relivent to other headers, in my case private, max-age=3600 and must-revalidate
Where
Cache-Control: private restricts proxy servers from caching it, this helps you keep your data off a server you dont trust and prevents a proxy from caching user specific data that’s not relevant to everyone (like a user profile).
Cache-Control "max-age=3600, must-revalidate" tell both client caches and proxy caches that once the content is stale (older than 3600 seconds) they must revalidate at the origin server before they can serve the content. This should be the default behavior of caching systems, but the must-revalidate directive makes this requirement unambiguous.
Where after the max-age expires the client should revalidate. It might revalidate using the If-Match or If-None-Match headers with an ETag, or it might use the If-Modified-Since or If-Unmodified-Since headers with a date. So, after expiration the browser will check at the server if the file is updated. If not, the server will respond with a 304 Not Modified header and nothing is downloaded.

HTTP Spec: PUT without data transfer, since hash of data is known to server

Does the HTTP/WebDav spec allow this client-server dialog?
client: I want to PUT data to /user1/foo.mkv which has this hash sum: HASH
server: OK, PUT was successful, you don't need to send the data since I already know the data with this hash sum.
Note: This PUT is an initial upload. It is not an update.
If this is possible, a way faster file syncing could be implemented.
Use case: The WebDAV server hosts a directory for each user. The favorite video foo.mkv gets uploaded by several users. In this example the favorite video is already stored at this location: /user2/myfoo.mkv. The second and following uploads don't need to send any data, since the server already knows the content. This would reduce a lot of network load.
Preconditions:
Client and server would need to agree on the hash algorithm beforehand.
The server needs to store the hash-value of already known files.
It would be very easy to implement this in a custom client and server. But that's not what I want.
My question: Is there an RFC or other standard that allows such a dialog?
If there is no standard yet, then how to proceed to get this dream come true?
Security consideration
With the above dialog it would be able to access the content of know hashes. Example an evil client knows that there is a file with the hash sum of 1234567.... He could do the above two steps and after that the client could use a GET to download the data.
A way around this to extend the dialog:
client: I want to PUT data which has this hash sum: HASH
server: OK, PUT would be successful, but to be sure that you have the data, please send me the bytes N up to M. I need this to be sure you have the hash-sum and the data.
client: Bytes N up to M of the data are abcde...
server: OK, your bytes match mine. I trust you. Upload successful, you don't need to send the data any more.
How to get this done?
Since it seems that there is not spec yet, this part of the question remains:
How to proceed to get this dream come true?
From what you described, it seems like ETags should be used.
It was specifically designed to associate a tag (usually an MD5 hash, but can be anything) with a resource's content (and/or location) so you can later tell whether the resource has changed or not.
PUT requests are supported by ETags and are commonly used with the If-Match header for optimistic concurrency control.
However, your use case is slightly different as you are trying to prevent a PUT to a resource with the same content, whereas the If-Match header is used to only allow the PUT to a resource with the same content.
In your case, you can instead use the If-None-Match header:
The meaning of "If-None-Match: *" is that the method MUST NOT be
performed if the representation selected by the origin server (or by a
cache, possibly using the Vary mechanism, see section 14.44) exists,
and SHOULD be performed if the representation does not exist. This
feature is intended to be useful in preventing races between PUT
operations.
WebDAV also supports Etags though how it's used may depend on the implementation:
Note that the meaning of an ETag in a PUT response is not clearly
defined either in this document or in RFC 2616 (i.e., whether the ETag
means that the resource is octet-for-octet equivalent to the body of
the PUT request, or whether the server could have made minor changes
in the formatting or content of the document upon storage). This is an
HTTP issue, not purely a WebDAV issue.
If you are implementing your own client, I would do something like this:
Client sends a HEAD request to the resource check the ETag
If the client sees that it matches what it has already, do not send anything else
If it doesn't match, then send the PUT request with the If-None-Matches header
UPDATE
From your updated question, it now seems clear that when a PUT request is received, you want to check ALL resources on the server for the absence of the same content before the request is accepted. That means also checking resources which are in a different location than what was specified as the destination to the PUT request.
AFAIK, there's no existing spec to specifically handle this case. However, the ETag mechanism (and the HTTP protocol) was designed to be generic and flexible enough to handle many cases and this is one of them.
Of course, this just means you can't take advantage of standard HTTP server logic -- you'd need to custom code both the client and server side.
Assumptions
Before I get into possible implementations, there are some assumptions that need to be made.
As mentioned, you need to control both the server and the client
An algorithm needs to be agreed upon for generating the ETag based on the content. This can be MD5, SHA1, SHA2-256, SHA3, a concatenation of a combination of them, etc. I'll just mention the algorithm output as the ETag, but how you do it is up to you.
Possible implementations
These have been ordered from simplest to increasing complexity if the simple case doesn't work for you.
Possible implementation 1
This assumes your server implementation allows you to read the request headers and respond before the entire request is received.
Client computes the ETag for the file/resource to upload.
Client sends a PUT request to the server (location doesn't matter) with the header If-None-Match containing the ETag and continue sending the body normally.
Server checks to see if a resource with the ETag already exists.
Server:
If ETag already exists, immediately return a 412 response code. Optionally terminate the connection to stop the client from continuing to send the resource (NOTE: This is NOT advisable by the HTTP spec, though not explicitly prohibited. See note 1 below). Yes, a little bandwidth is wasted, but you wouldn't have to wait for the entire request to finish.
If ETag doesn't exist, wait for the request to finish normally.
Client:
If the 412 response is received, interpreted it such that the resource already exists and the request needs to be aborted -- stop sending data.
Possible implementation 2
This is slightly more complex, but better adheres to the HTTP spec. Also, this MIGHT work if your server architecture doesn't allow you to read the headers before the entire request is received.
Client computes the ETag for the file/resource to upload.
Client sends a PUT request to the server (location doesn't matter) with the header If-None-Match containing the ETag and an Expect: 100-continue header. The request body is NOT yet sent at this point.
Server checks to see if a resource with the ETag already exists.
Server:
If ETag already exists, return a 412 response.
If ETag doesn't exist, send a 100 response and wait for the request to finish normally.
Client:
If the 412 response is received, interpreted it such that the resource already exists and the request was therefore aborted.
If the 100 response is received, continue sending the body normally
Possible implementation 3
This implementation probably requires the most work but should be broadly compatible with all major libraries / architectures. There's a small risk of another client uploading a file with the same contents in between the two requests though.
Client computes the ETag for the file/resource to upload.
Client sends a HEAD request (no body) to the server at /check-etag/<etag> where <etag> is the ETag. This checks whether the ETag already exists at the server.
Server code at /check-etag/* checks to see if a resource with that ETag already exists.
Server:
If ETag already exists, return a 200 response.
If ETag doesn't exist, send a 404 response.
Client:
If the 200 response is received, interpreted it such that the resource already exists and do not proceed with a PUT request.
If the 404 response is received, follow up with a normal PUT request to the intended destination.
Considerations
Although the implementation is up to you, here are some points to consider:
When a resource is added or updated, the ETag and the location should be stored in a database for quick retrieval. It is needlessly inefficient for a server to recompute the hash for every single resource whenever a resource is being uploaded. There should also be an index on the ETag and location fields for quick retrieval.
If two clients upload a resource with the same ETag at the same time, you might want to abort the 2nd one as soon as the 1st one finishes.
Using hashes for ETag means that there's a possibility for collision (where two resource would have the same hash), though in practice, the possibility is extremely slim if a good hash is used. Note that MD5 is known to be weak to intentional collision attacks. If you are paranoid, you can concatenate multiple hashes to make collision a much smaller chance.
In regards to your "security consideration", I still don't see how knowing a hash would lead to retrieval of a resource. The server will only and SHOULD ONLY tell you whether a specific ETag exists or not. Without divulging the location, it's not possible for the client to retrieve the file. And even if the client knows the location, the server SHOULD implement other security controls such as authentication and authorizations to restrict access. Using the resource location solely as a way of restricting access is just security by obscurity, especially since from what you mentioned, the paths seem to follow a pattern by username.
Notes
RFC 2616 indicates this SHOULD NOT be done:
If an origin server receives a request that does not include an Expect
request-header field with the "100-continue" expectation, the request
includes a request body, and the server responds with a final status
code before reading the entire request body from the transport
connection, then the server SHOULD NOT close the transport connection
until it has read the entire request, or until the client closes the
connection. Otherwise, the client might not reliably receive the
response message.
Also, DO NOT close the connection from the server side without sending any status codes, as the client will most likely retry the request:
If an HTTP/1.1 client sends a request which includes a request body,
but which does not include an Expect request-header field with the
"100-continue" expectation, and if the client is not directly
connected to an HTTP/1.1 origin server, and if the client sees the
connection close before receiving any status from the server, the
client SHOULD retry the request.

best approach to design a rest web service with binary data to be consumed from the browser

I'm developing a json rest web service that will be consumed from a single web page app built with backbone.js
This API will let the consumer upload files related to some entity, like pdf reports related to a project
Googling around and doing some research at stack overflow I came with these possible approaches:
First approach: base64 encoded data field
POST: /api/projects/234/reports
{
author: 'xxxx',
abstract: 'xxxx',
filename: 'xxxx',
filesize: 222,
content: '<base64 encoded binary data>'
}
Second approach: multipart form post:
POST: /api/projects/234/reports
{
author: 'xxxx',
abstract: 'xxxx',
}
as a response I'll get a report id, and with that I shall issue another post
POST: /api/projects/234/reports/1/content
enctype=multipart/form-data
and then just send the binary data
(have a look at this: https://stackoverflow.com/a/3938816/47633)
Third approach: post the binary data to a separate resource and save the href
first I generate a random key at the client and post the binary content there
POST: /api/files/E4304205-29B7-48EE-A359-74250E19EFC4
enctype=multipart/form-data
and then
POST: /api/projects/234/reports
{
author: 'xxxx',
abstract: 'xxxx',
filename: 'xxxx',
filesize: 222,
href: '/api/files/E4304205-29B7-48EE-A359-74250E19EFC4'
}
(see this: https://stackoverflow.com/a/4032079/47633)
I just wanted to know if there's any other approach I could use, the pros/cons of each, and if there's any established way to deal with this kind of requirements
the big con I see to the first approach, is that I have to fully load and base64 encode the file on the client
some useful resources:
Post binary data to a RESTful application
What is a good way to transfer binary data to a HTTP REST API service?
How do I upload a file with metadata using a REST web service?
Bad idea to transfer large payload using web services?
https://stackoverflow.com/a/5528267/47633
My research results:
Single request (data included)
The request contains metadata. The data is a property of metadata and encoded (for example: Base64).
Pros:
transactional
everytime valid (no missing metadata or data)
Cons:
encoding makes the request very large
Examples:
Twitter
GitHub
Imgur
Single request (multipart)
The request contains one or more parts with metadata and data.
Content types:
multipart/form-data
multipart/mixed
multipart/related
Pros:
transactional
everytime valid (no missing metadata or data)
Cons:
content type negotiation is complex
content type for data is not visible in WADL
Examples:
Confluence (with parts for data and for metadata)
Jira (with one part for data, metadata only part headers for file name and mime type)
Bitbucket (with one part for data, no metadata)
Google Drive (with one part for metadata and one for part data)
Single request (metadata in HTTP header and URL)
The request body contains the data and the HTTP header and the URL contains the metadata.
Pros:
transactional
everytime valid (no missing metadata or data)
Cons:
no nested metadata possible
Examples:
S3 GetObject and PutObject
Two request
One request for metadata and one or more requests for data.
Pros:
scalability (for example: data request could go to repository server)
resumable (see for example Google Drive)
Cons:
not transactional
not everytime valid (before second request, one part is missing)
Examples:
Google Drive
YouTube
I can't think of any other approaches off the top of my head.
Of your 3 approaches, I've worked with method 3 the most. The biggest difference I see is between the first method and the other 2: Separating metadata and content into 2 resources
Pro: Scalability
while your solution involves posting to the same server, this can easily be changed to point the content upload to a separate server (i.e. Amazon S3)
In the first method, the same server that serves metadata to users will have a process blocked by a large upload.
Con: Orphaned Data/Added complexity
failed uploads (either metadata or content) will leave orphaned data in the server DB
Orphaned data can be cleaned up with a scheduled job, but this adds code complexity
Method II reduces the orphan possibilities, at the cost of longer client wait time as you're blocking on the response of the first POST
The first method seems the most straightforward to code. However, I'd only go with the first method if anticipate this service being used infrequently and you can set a reasonable limit on the user file uploads.
I believe the ultimate method is number 3 (separate resource) for the main reason that it allows maximizing the value I get from the HTTP standard, which matches how I think of REST APIs. For example, and assuming a well-grounded HTTP client is in the use, you get the following benefits:
Content compression: You optimize by allowing servers to respond with compressed result if clients indicate they support, your API is unchanged, existing clients continue to work, future clients can make use of it
Caching: If-Modified-Since, ETag, etc. Clients can advoid refetching the binary data altogether
Content type abstraction: For example, you require an uploaded image, it can be of types image/jpeg or image/png. The HTTP headers Accept and Content-type give us some elegant semantics for negotiating this between clients and servers without having to hardcode it all as part of our schema and/or API
On the other hand, I believe it's fair to conclude that this method is not the simplest if the binary data in question is not optional. In which case the Cons listed in Eric Hu's answer will come into play.