Are CSRF tokens base64url encoded? - csrf

I'm working on an application that parses a CSRF token from a cookie header. I'd like to know whether CSRF tokens are base64 encoded with URL-safe characters (cf. https://simplycalc.com/base64url-encode.php) so that I can match them with the regular expression
[.a-zA-Z0-9_-]+
I was able to find documentation on JSON web tokens (JWTs) indicating that they consist of base64url-encoded portions separated by periods ('.'), but I wasn't able to find similar documentation on CSRF tokens.
Are CSRF tokens also generally limited to a certain character set, or can they contain any characters?

A CSRF token is an opaque "nonce" that doesn't contain any info -- the token in the form submission and the token in the cookie or header simply have to match is all. If you see them base64-encoded, it's just for convenience of transmission, but it won't decode to anything useful, just random bytes most of the time. Nothing like the JSON structure of JWT.
Looking at my current framework (Laravel), its CSRF tokens are just random strings (they're derived from base64, but not valid base64). Chances are that's the case for most other frameworks too.

Related

Should SAMLResponse contain extra line breaks?

I have been working on a solution that retrieves SAMLResponse from third-party IdP and we simply decode that SAMLResposne with jdk Base64 decoder,
However one of the cases is where we get SAMLResponse with line breaks (\n) after some characters and when we try to decode it with,
...
byte[] base64DecodedResponse = Base64.getDecoder().decode(authnResponse);
...
This authnResposne is SAMLResponse from HTTP header which has \n new line, this failed to parse in above code.
I have been looking for a confirmation whether any SAMLResponse received by SPs must be in Base64 encoded format hence should never contain line breaks or it can be and SP should handle it.
Applying fix from SP side is simple, simply .replaceAll("\n","") will do the job, but is it really industry standard to EDIT the SAMLResponse?
You probably need to use Base64.Decoder getUrlDecoder()
SAML2 is supposed to be encoded in base64url - basically Base 64 Encoding with URL and Filename Safe Alphabet https://www.rfc-editor.org/rfc/rfc4648#section-5.
getUrlDecoder also should reject embedded newlines in the base64 so it may not do you any good.
I would be interested in knowing which SAML provider you are using.
For those who looking for wisdom here,
Editing SAMLResponse after it's signed is bad practice.
According to SAML documentation, SAMLResponse encoding can have either BASE 64 Content-Transfer encoding RFC-2045 or Base64 Encoding RFC-4648.
From SAML 2.0 core-doc, section 5
Profiles MAY specify alternative signature mechanisms such as S/MIME or signed Java objects that contain SAML documents. Caveats about retaining context and interoperability apply
This leads to the justification that SPs should be able to decode Standard and MiMe decoding, hence a try and catch block with Base64.getMimeDecoder() to get around this issue.

Why do you use base64 URL encoding with JSON web tokens?

The Scenario:
I'm reading about JSON web tokens at this link (https://medium.com/vandium-software/5-easy-steps-to-understanding-json-web-tokens-jwt-1164c0adfcec). It outline how to create a JSON web token, you create a header and a payload, and then create a signature using the following pseudocode:
data = base64urlEncode( header ) + “.” + base64urlEncode( payload )
hashedData = hash( data, secret )
signature = base64urlEncode( hashedData )
My Question:
Why does the pseudocode use base64urlEncode when creating data and signature?
Scope Of What I Understand So Far:
Base64 allows you to express binary data using text characters from the Base64 set of 64 text characters. This is usually used when you have a set of data that you want to pass through some channel that might misinterpret some of the characters, but would not misinterpret Base64 characters, so you encode it using Base64 so that the data won't get misinterpreted. Base64 URL encoding, on the other hand, is analogous to Base64 encoding except that you use only a subset of the Base64 character set that does not include characters that have special meaning in URLs, so that if you use the Base64 URL encoded string in a URL, its meaning won't get misinterpreted.
Assuming my understanding there is correct, I'm trying to understand why base64urlEncode() is used in computing data and signature in the pseudocode above. Is the signature of a JSON web token going to be used somewhere in a URL? If so, why is data base64urlEncoded as well before hashing. Why not just encode the signature? Is there something about the hash function that would require its data parameter to be Base64 URL encoded?
When using the OAuth Implicit Grant, JWTs may be transferred as part of URL fragments.
That is just an example, but I guess in general it was presumed that JWTs might be passed through URLs, so base64urlEncodeing them makes sense.
The first line of the IETF JWT standard abstract even says:
JSON Web Token (JWT) is a compact, URL-safe means of representing claims to be transferred between two parties.
(Note that the OAuth Implicit Grant is no longer recommended to be used.)

Why header and payload in the JWT token always starts with eyJ

I am using JWT token to authorize my APIs, during implementation I found header and payload in token always start with eyJ. What does this indicate?
JWTs consist of base64url encoded JSON, and a JSON structure just starts with {"..., which becomes ey...when encoded with a base64 encoder.
The JWT header starts with {"alg":..., which then becomes eyJ...
You can try on this online encoder and enter {"alg" and click on encode. The result will be eyJhbGciPSA=
I'm afraid the question, and answer above is a little too extensive/certain.
The best you can check for is (only) 'ey', as the first JSON member could be something else such as "typ" (rather than "alg"); I wouldn't recommend assuming the order of JSON members (even if they are supposed to follow a prescribed order - i.e. allowing for the possibility of real-world anomalies/a small amount of flex).
Also, as much as it is probably unlikely - as far as what is been produced by a particular implementation, there could be a (/some) whitespace following the opening (JSON object) brace character (and maybe even before it!) - I'm not sure if the standard/RFCs forbid this, but even if it's only a temporal instance of a bug (within the JWT generation process) this could in-theory occur; so you're better in only checking for 'ey' - as a quick smoke-test, before then proceeding on to a fuller/complete validation of the JWT.
(F.Y.I. I believe it may have been Microsoft 'Identity Platform' whereby "typ" preceded "alg" - if memory serves me correctly (?), but I can't swear to it / as to where I've seen this being the case - at least at one point in time.)

When would you use an unprotected JWS header?

I don't understand why JWS unprotected headers exist.
For some context: a JWS unprotected header contains parameters that are not integrity protected and can only be used per-signature with JSON Serialization.
If they could be used as a top-level header, I could see why someone could want to include a mutable parameter (that wouldn't change the signature). However, this is not the case.
Can anyone think of a use-case or know why they are included in the spec?
Thanks!
JWS Spec
The answer by Florent leaves me unsatisfied.
Regarding the example of using a JWT to sign a hash of a document... the assertion is that the algorithm and keyID would be "sensitive data" that needs to be "protected". By which I suppose he means "signed". But there's no need to sign the algorithm and keyID.
Example
Suppose Bob creates a signed JWT, that contains an unprotected header asserting alg=HS256 and keyid=XXXX1 . This JWT is intended for transmission to Alice.
Case 1
Suppose Mallory intercepts the signed JWT sent by Bob. Mallory then creates a new unprotected header, asserting alg=None.
The receiver (Alice) is now responsible for verifying the signature on the payload. Alice must not be satisfied with "no signature"; in fact Alice must not rely on a client (sender) assertion to determine which signing algorithm is acceptable for her. Therefore Alice rejects the JWT with the contrived "no signature" header.
Case 2
Suppose Mallory contrives a header with alg=RS256 and keyId=XXX1. Now Alice tries to validate the signature and finds either:
the algorithm is not compliant
the key specified for that algorithm does not exist
Therefore Alice rejects the JWT.
Case 3
Suppose Mallory contrives a header with alg=HS256 and keyId=ZZ3. Now Alice tries to validate the signature and finds the key is unknown, and rejects the JWT.
In no case does the algorithm need to be part of the signed material. There is no scenario under which an unprotected header leads to a vulnerability or violation of integrity.
Getting Back to the Original Question
The original question was: What is the purpose of an unprotected JWT header?
Succinctly, the purpose of an unprotected JWS header is to allow transport of some metadata that can be used as hints to the receiver. Like alg (Algorithm) and kid (Key ID). Florent suggests that stuffing data into an unprotected header could lead to efficiency. This isn't a good reason. Here is the key point: The claims in the unprotected header are hints, not to be relied upon or trusted.
A more interesting question is: What is the purpose of a protected JWS header? Why have a provision that signs both the "header" and the "payload"? In the case of a JWS Protected Header, the header and payload are concatenated and the result is signed. Assuming the header is JSON and the payload is JSON, at this point there is no semantic distinction between the header and payload. So why have the provision to sign the header at all?
One could just rely on JWS with unprotected headers. If there is a need for integrity-protected claims, put them in the payload. If there is a need for hints, put them in the unprotected header. Sign the payload and not the header. Simple.
This works, and is valid. But it presumes that the payload is JSON. This is true with JWT, but not true with all JWS. RFC 7515, which defines JWS, does not require the signed payload to be JSON. Imagine the payload is a digital image of a medical scan. It's not JSON. One cannot simply "attach claims" to that. Therefore JWS allows a protected header, such that the (non JSON) payload AND arbitrary claims can be signed and integrity checked.
In the case where the payload is non-JSON and the header is protected, there is no facility to include "extra non signed headers" into the JWS. If there is a need for sending some data that needs to be integrity checked and some that are simply "hints", there really is only one container: the protected header. And the hints get signed along with the real claims.
One could avoid the need for this protected-header trick, by just wrapping a JSON hash around the data-to-be-signed. For example:
{
"image" : "qw93u9839839...base64-encoded image data..."
}
And after doing so, one could add claims to this JSON wrapper.
{
"image" : "qw93u9839839...base64-encoded image data..."
"author" : "Whatever"
}
And those claims would then be signed and integrity-proected.
But in the case of binary data, encoding it to a string to allow encapsulation into a JSON may inflate the data significantly. A JWS with a non-JSON payload avoids this.
HTH
The RFC gives us examples of unprotected headers as follows:
A.6.2. JWS Per-Signature Unprotected Headers
Key ID values are supplied for both keys using per-signature Header Parameters. The two JWS Unprotected Header values used to represent these key IDs are:
{"kid":"2010-12-29"}
and
{"kid":"e9bc097a-ce51-4036-9562-d2ade882db0d"}
https://datatracker.ietf.org/doc/html/rfc7515#appendix-A.6.2
The use of kid in the example is likely not coincidence. Because JWS allows multiple signatures per payload, a cleartext hint system could be useful. For example, if a key is not available to the verifier (server), then you can skip decoding the protected header. The term "hint" is actually used in the kid definition:
4.1.4. "kid" (Key ID) Header Parameter
The "kid" (key ID) Header Parameter is a hint indicating which key was used to secure the JWS. This parameter allows originators to explicitly signal a change of key to recipients. The structure of the "kid" value is unspecified. Its value MUST be a case-sensitive string. Use of this Header Parameter is OPTIONAL.
https://datatracker.ietf.org/doc/html/rfc7515#section-4.1.4
If we look at Key Identification it mentions where you a kid does not have to be integrity protected (ie: part of unprotected headers): (emphasis mine)
6. Key Identification
It is necessary for the recipient of a JWS to be able to determine the key that was employed for the digital signature or MAC operation. The key employed can be identified using the Header Parameter methods described in Section 4.1 or can be identified using methods that are outside the scope of this specification. Specifically, the Header Parameters "jku", "jwk", "kid", "x5u", "x5c", "x5t", and "x5t#S256" can be used to identify the key used. These Header Parameters MUST be integrity protected if the information that they convey is to be utilized in a trust decision; however, if the only information used in the trust decision is a key, these parameters need not be integrity protected, since changing them in a way that causes a different key to be used will cause the validation to fail.
The producer SHOULD include sufficient information in the Header Parameters to identify the key used, unless the application uses another means or convention to determine the key used. Validation of the signature or MAC fails when the algorithm used requires a key (which is true of all algorithms except for "none") and the key used cannot be determined.
The means of exchanging any shared symmetric keys used is outside the scope of this specification.
https://datatracker.ietf.org/doc/html/rfc7515#section-6
Simplified, if you have a message that by somebody modifying the kid will refer to another key, then the signature itself will not match. Therefore you don't have to include the kid in the protected header. A good example of the first part, where the information they convey is to be utilized in a trust decision, is the ACME (aka the Let's Encrypt protocol). When creating an account, and storing the key data, you want to trust the kid. We want to store kid, so we need to make sure it's valid. After the server has stored the kid and can use it to get a key, we can push messages and reference the kid in unprotected header (not done by ACME but possible). Since we're only going to verify the signature, then the kid is used a hint or reference to which kid was used for the account. If that field is tampered with, then it'll point to a nonexistent of completely different key and fail the signature the check. That means the kid itself is "the only information used in the trust decision".
There's also more theoretical scenarios that, knowing how it works you can come up with.
For example: the idea of having multiple signatures that you can pass on (exchange). A signing authority can include one signature that can be for an intermediary (server) and another for the another recipient (end-user client). This is differentiated by the kid and the server doesn't need to verify or even decode the protected header or signature. Or perhaps, the intermediary doesn't have the client's secret in order to verify a signature.
For example, a multi-recipient message (eg: chat room) could be processed by a relay/proxy and using kid in the unprotected header, pass along a reconstructed compact JWS (${protected}.${payload}.${signature}) for each recipient based on kid (or any other custom unprotected header field, like userId or endpoint).
Another example, would be a server with access to many different keys and a cleartext kid would be faster than iterating and decoded each protected field to find which one.
From one perspective, all you're doing is skipping base64url decoding the protected header for performance, but if you're going to proxy/relay the data, then you're not polluting the protected header which is meant for another recipient.

What is the length of the access_token in Facebook OAuth2?

I searched on Google and StackOverflow to find a answer to my question but I can't find one.
I'd like to store the access_token to my database for offline access and I'd like to be sure to specify the correct length of my column.
I can't even find if it's just a number or a mix between number and strings.
I work at Facebook and I can give a definitive answer about this.
Please don't put a maximum size on the storage for an access token. We expect that they will both grow and shrink over time as we add and remove data and change how they are encoded.
We did give guidance in one place about it being 255 characters. I've updated the blog post that had that information and updated our new access token docs to include a note about sizes:
https://developers.facebook.com/docs/facebook-login/access-tokens/
Sorry for the confusion.
With Facebook's recent move to encrypted access tokens, the length of the access token can be up to 255 characters. If you're storing the access token in your database, the column should be able to accommodate at least varchar(255). Here's an excerpt from Facebook's Developer blog from October 4, 2011:
"With the Encrypted Access Token migration enabled, the format of the access token has changed. The new access token format is completely opaque and you should not take any dependency on the format in your code. A varchar(255) field will be sufficient to store the new tokens."
Full blog post here: https://developers.facebook.com/blog/post/572
This answer is no longer correct, and I can't find a corrected value in FB's docs. We have been receiving access tokens that are longer than 255 characters. We're moving from VARCHAR to a SMALLTEXT instead to try to future-proof things.
From section 1.4 of The OAuth 2.0 Authorization Protocol (draft-ietf-oauth-v2-22)
Access tokens can have different formats, structures, and methods
of utilization (e.g. cryptographic properties) based on the
resource server security requirements. Access token attributes and
the methods used to access protected resources are beyond the scope
of this specification and are defined by companion specifications.
I looked for the "companion specifications" but didn't find anything relevant and in section 11.2.2 it states
o Parameter name: access_token
o Parameter usage location: authorization response, token response
o Change controller: IETF
o Specification document(s): [[ this document ]]
Which seems to indicate that the access_token parameter is defined within this spec. Which I guess the parameter is but the actual access token isn't fully fleshed out.
Update:
The latest version of this writing of the specification (draft-ietf-oauth-v2-31) includes an appendix that defines better what to expect from the access_token parameter
A.12. "access_token" Syntax
The "access_token" element is defined in Section 4.2.2 and
Section 5.1:
access-token = 1*VSCHAR
So essentially what this means is that the access_token should be at least 1 character long but there is no limit on how long defined in this specification.
Note they define VSCHAR = %x20-7E
Facebook access token can be longer than 255 characters. I had a lot of errors like ActiveRecord::StatementInvalid: PG::StringDataRightTruncation: ERROR: value too long for type character varying(255) where the value was facebook access token. Do not use string type column because its length is limited. You can use text type column to store tokens.
Recently, our app has been seeing them longer than 100 characters. I'm still looking for documentation so I can figure out a 'safe' field size for them.
I'll update the answer from the time spend.
From the OAuth2 documentation,
The access token string size is left undefined by this specification. The client should avoid making assumptions about value sizes. The authorization server should document the size of any value it issues.
(Section 4.2.2 of this document)
Note: Facebook is using OAuth2, as mentionned on this page.
So now, no informations seems to be available on the developers portail of Facebook about the length of the OAuth token. Yahoo seems to use a 400 bit long token, so it's best to assume that a TEXT column in MySQL is safer than a varchar.