Why do you use base64 URL encoding with JSON web tokens? - hash

The Scenario:
I'm reading about JSON web tokens at this link (https://medium.com/vandium-software/5-easy-steps-to-understanding-json-web-tokens-jwt-1164c0adfcec). It outline how to create a JSON web token, you create a header and a payload, and then create a signature using the following pseudocode:
data = base64urlEncode( header ) + “.” + base64urlEncode( payload )
hashedData = hash( data, secret )
signature = base64urlEncode( hashedData )
My Question:
Why does the pseudocode use base64urlEncode when creating data and signature?
Scope Of What I Understand So Far:
Base64 allows you to express binary data using text characters from the Base64 set of 64 text characters. This is usually used when you have a set of data that you want to pass through some channel that might misinterpret some of the characters, but would not misinterpret Base64 characters, so you encode it using Base64 so that the data won't get misinterpreted. Base64 URL encoding, on the other hand, is analogous to Base64 encoding except that you use only a subset of the Base64 character set that does not include characters that have special meaning in URLs, so that if you use the Base64 URL encoded string in a URL, its meaning won't get misinterpreted.
Assuming my understanding there is correct, I'm trying to understand why base64urlEncode() is used in computing data and signature in the pseudocode above. Is the signature of a JSON web token going to be used somewhere in a URL? If so, why is data base64urlEncoded as well before hashing. Why not just encode the signature? Is there something about the hash function that would require its data parameter to be Base64 URL encoded?

When using the OAuth Implicit Grant, JWTs may be transferred as part of URL fragments.
That is just an example, but I guess in general it was presumed that JWTs might be passed through URLs, so base64urlEncodeing them makes sense.
The first line of the IETF JWT standard abstract even says:
JSON Web Token (JWT) is a compact, URL-safe means of representing claims to be transferred between two parties.
(Note that the OAuth Implicit Grant is no longer recommended to be used.)

Related

Should SAMLResponse contain extra line breaks?

I have been working on a solution that retrieves SAMLResponse from third-party IdP and we simply decode that SAMLResposne with jdk Base64 decoder,
However one of the cases is where we get SAMLResponse with line breaks (\n) after some characters and when we try to decode it with,
...
byte[] base64DecodedResponse = Base64.getDecoder().decode(authnResponse);
...
This authnResposne is SAMLResponse from HTTP header which has \n new line, this failed to parse in above code.
I have been looking for a confirmation whether any SAMLResponse received by SPs must be in Base64 encoded format hence should never contain line breaks or it can be and SP should handle it.
Applying fix from SP side is simple, simply .replaceAll("\n","") will do the job, but is it really industry standard to EDIT the SAMLResponse?
You probably need to use Base64.Decoder getUrlDecoder()
SAML2 is supposed to be encoded in base64url - basically Base 64 Encoding with URL and Filename Safe Alphabet https://www.rfc-editor.org/rfc/rfc4648#section-5.
getUrlDecoder also should reject embedded newlines in the base64 so it may not do you any good.
I would be interested in knowing which SAML provider you are using.
For those who looking for wisdom here,
Editing SAMLResponse after it's signed is bad practice.
According to SAML documentation, SAMLResponse encoding can have either BASE 64 Content-Transfer encoding RFC-2045 or Base64 Encoding RFC-4648.
From SAML 2.0 core-doc, section 5
Profiles MAY specify alternative signature mechanisms such as S/MIME or signed Java objects that contain SAML documents. Caveats about retaining context and interoperability apply
This leads to the justification that SPs should be able to decode Standard and MiMe decoding, hence a try and catch block with Base64.getMimeDecoder() to get around this issue.

Requests fail authorization when query string contains certain characters

I'm making requests to Twitter, using the OAuth1.0 signing process to set the Authorization header. They explain it step-by-step here, which I've followed. It all works, most of the time.
Authorization fails whenever special characters are sent without percent encoding in the query component of the request. For example, ?status=hello%20world! fails, but ?status=hello%20world%21 succeeds. But the change from ! to the percent encoded form %21 is only made in the URL, after the signature is generated.
So I'm confused as to why this fails, because AFAIK that's a legally encoded query string. Only the raw strings ("status", "hello world!") are used for signature generation, and I'd assume the server would remove any percent encoding from the query params and generate its own signature for comparison.
When it comes to building the URL, I let URLComponents do the work, so I don't add percent encoding manually, ex.
var urlComps = URLComponents()
urlComps.scheme = "https"
urlComps.host = host
urlComps.path = path
urlComps.queryItems = [URLQueryItem(key: "status", value: "hello world!")]
urlComps.percentEncodedQuery // "status=hello%20world!"
I wanted to see how Postman handled the same request. I selected OAuth1.0 as the Auth type and plugged in the same credentials. The request succeeded. I checked the Postman console and saw ?status=hello%20world%21; it was percent encoding the !. I updated Postman, because a nice little prompt asked me to. Then I tried the same request; now it was getting an authorization failure, and I saw ?status=hello%20world! in the console; the ! was no longer being percent encoded.
I'm wondering who is at fault here. Perhaps Postman and I are making the same mistake. Perhaps it's with Twitter. Or perhaps there's some proxy along the way that idk, double encodes my !.
The OAuth1.0 spec says this, which I believe is in the context of both client (taking a request that's ready to go and signing it before it's sent), and server (for generating another signature to compare against the one received):
The parameters from the following sources are collected into a
single list of name/value pairs:
The query component of the HTTP request URI as defined by
[RFC3986], Section 3.4. The query component is parsed into a list
of name/value pairs by treating it as an
"application/x-www-form-urlencoded" string, separating the names
and values and decoding them as defined by
[W3C.REC-html40-19980424], Section 17.13.4.
That last reference, here, outlines the encoding for application/x-www-form-urlencoded, and says that space characters should be replaced with +, non-alphanumeric characters should be percent encoded, name separated from value by =, and pairs separated by &.
So, the OAuth1.0 spec says that the query string of the URL needs to be decoded as defined by application/x-www-form-urlencoded. Does that mean that our query string needs to be encoded this way too?
It seems to me, if a request is to be signed using OAuth1.0, the query component of the URL that gets sent must be encoded in a way that is different to what it would normally be encoded in? That's a pretty significant detail if you ask me. And I haven't seen it explicitly mentioned, even in Twitter's documentation. And evidently the folks at Postman overlooked it too? Unless I'm not supposed to be using URLComponents to build a URL, but that's what it's for, no? Have I understood this correctly?
Note: ?status=hello+world%21 succeeds; it tweets "hello world!"
I ran into a similar issue.
put the status in post body, not query string.
Percent-encoding:
private encode(str: string) {
// encodeURIComponent() escapes all characters except: A-Z a-z 0-9 - _ . ! ~ * " ( )
// RFC 3986 section 2.3 Unreserved Characters (January 2005): A-Z a-z 0-9 - _ . ~
return encodeURIComponent(str)
.replace(/[!'()*]/g, c => "%" + c.charCodeAt(0).toString(16).toUpperCase());
}

Are CSRF tokens base64url encoded?

I'm working on an application that parses a CSRF token from a cookie header. I'd like to know whether CSRF tokens are base64 encoded with URL-safe characters (cf. https://simplycalc.com/base64url-encode.php) so that I can match them with the regular expression
[.a-zA-Z0-9_-]+
I was able to find documentation on JSON web tokens (JWTs) indicating that they consist of base64url-encoded portions separated by periods ('.'), but I wasn't able to find similar documentation on CSRF tokens.
Are CSRF tokens also generally limited to a certain character set, or can they contain any characters?
A CSRF token is an opaque "nonce" that doesn't contain any info -- the token in the form submission and the token in the cookie or header simply have to match is all. If you see them base64-encoded, it's just for convenience of transmission, but it won't decode to anything useful, just random bytes most of the time. Nothing like the JSON structure of JWT.
Looking at my current framework (Laravel), its CSRF tokens are just random strings (they're derived from base64, but not valid base64). Chances are that's the case for most other frameworks too.

How to manually validate a JWT signature using online tools

From what I can understand, it's a straight forward process to validate a JWT signature. But when I use some online tools to do this for me, it doesn't match. How can I manually validate a JWT signature without using a JWT library? I'm needing a quick method (using available online tools) to demo how this is done.
I created my JWT on https://jwt.io/#debugger-io with the below info:
Algorithm: HS256
Secret: hONPMX3tHWIp9jwLDtoCUwFAtH0RwSK6
Header:
{
"alg": "HS256",
"typ": "JWT"
}
Payload:
{
"sub": "1234567890",
"name": "John Doe",
"iat": 1516239022
}
Verify Signature (section):
Secret value changed to above
"Checked" secret base64 encoded (whether this is checked or not, still get a different value)
JWT:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.wDQ2mU5n89f2HsHm1dluHGNebbXeNr748yJ9kUNDNCA
Manual JWT signature verification attempt:
Using a base64UrlEncode calculator (http://www.simplycalc.com/base64url-encode.php or https://www.base64encode.org/)
If I: (Not actual value on sites, modified to show what the tools would ultimately build for me)
base64UrlEncode("eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9") + "." + base64UrlEncode("eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ")
I get:
ZXlKaGJHY2lPaUpJVXpJMU5pSXNJblI1Y0NJNklrcFhWQ0o5.ZXlKemRXSWlPaUl4TWpNME5UWTNPRGt3SWl3aWJtRnRaU0k2SWtwdmFHNGdSRzlsSWl3aWFXRjBJam94TlRFMk1qTTVNREl5ZlE=
NOTE: there's some confusion on my part if I should be encoding the already encoded values, or use the already encoded values as-is.
(i.e. using base64UrlEncode("eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9") + "." + base64UrlEncode("eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ") vs "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ").
Regardless on which I should do, the end result still doesn't match the signature. I'm leaning towards that I should NOT re-encode the encoded value, whether that's true or not.
Then using a HMAC Generator calculator (https://codebeautify.org/hmac-generator or https://www.freeformatter.com/hmac-generator.html#ad-output)
(Not actual value on sites, modified to show what the tools would ultimately build for me)
HMACSHA256(
"ZXlKaGJHY2lPaUpJVXpJMU5pSXNJblI1Y0NJNklrcFhWQ0o5.ZXlKemRXSWlPaUl4TWpNME5UWTNPRGt3SWl3aWJtRnRaU0k2SWtwdmFHNGdSRzlsSWl3aWFXRjBJam94TlRFMk1qTTVNREl5ZlE=",
"hONPMX3tHWIp9jwLDtoCUwFAtH0RwSK6"
)
Which gets me:
a2de322575675ba19ec272e83634755d4c3c2cd74e9e23c8e4c45e1683536e01
And that doesn't match the signature portion of the JWT:
wDQ2mU5n89f2HsHm1dluHGNebbXeNr748yJ9kUNDNCAM != a2de322575675ba19ec272e83634755d4c3c2cd74e9e23c8e4c45e1683536e01
Purpose:
The reason I'm needing to confirm this is to prove the ability to validate that the JWT hasn't been tampered with, without decoding the JWT.
My clients web interface doesn't need to decode the JWT, so there's no need for them to install a jwt package for doing that. They just need to do a simple validation to confirm the JWT hasn't been tampered with (however unlikely that may be) before they store the JWT for future API calls.
It's all a matter of formats and encoding.
On https://jwt.io you get this token based on your input values and secret:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.3pIaKksiX9Zv8Jg-hWbrD24VhL36hBIFaNpA4fVx29M
We want to prove that the signature:
3pIaKksiX9Zv8Jg-hWbrD24VhL36hBIFaNpA4fVx29M
is correct.
The signature is a HMAC-SHA256 hash that is Base64url encoded.
(as described in RFC7515)
When you use the online HMAC generator to calculate a hash for
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ
with the secret
hONPMX3tHWIp9jwLDtoCUwFAtH0RwSK6
you get
de921a2a4b225fd66ff0983e8566eb0f6e1584bdfa84120568da40e1f571dbd3
as result, which is a HMAC-SHA256 value, but not Base64url encoded. This hash is a hexadecimal string representation of a large number.
To compare it with the value from https://jwt.io you need to convert the value from it's hexadecimal string representation back to a number and Base64url encode it.
The following script is doing that and also uses crypto-js to calculate it's own hash. This can also be a way for you to verify without JWT libraries.
var CryptoJS = require("crypto-js");
// the input values
var base64Header = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9";
var base64Payload = "eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ";
var secret = "hONPMX3tHWIp9jwLDtoCUwFAtH0RwSK6";
// two hashes from different online tools
var signatureJWTIO = "3pIaKksiX9Zv8Jg-hWbrD24VhL36hBIFaNpA4fVx29M";
var onlineCaluclatedHS256 = "de921a2a4b225fd66ff0983e8566eb0f6e1584bdfa84120568da40e1f571dbd3";
// hash calculation with Crypto-JS.
// The two replace expressions convert Base64 to Base64url format by replacing
// '+' with '-', '/' with '_' and stripping the '=' padding
var base64Signature = CryptoJS.HmacSHA256(base64Header + "." + base64Payload , secret).toString(CryptoJS.enc.Base64).replace(/\+/g,'-').replace(/\//g,'_').replace(/\=+$/m,'');
// converting the online calculated value to Base64 representation
var base64hash = new Buffer.from(onlineCaluclatedHS256, 'hex').toString('base64').replace(/\//g,'_').replace(/\+/g,'-').replace(/\=+$/m,'')
// the results:
console.log("Signature from JWT.IO : " + signatureJWTIO);
console.log("NodeJS calculated hash : " + base64Signature);
console.log("online calulated hash (converted) : " + base64hash);
The results are:
Signature from JWT.IO : 3pIaKksiX9Zv8Jg-hWbrD24VhL36hBIFaNpA4fVx29M
NodeJS calculated hash : 3pIaKksiX9Zv8Jg-hWbrD24VhL36hBIFaNpA4fVx29M
online calulated hash (converted) : 3pIaKksiX9Zv8Jg-hWbrD24VhL36hBIFaNpA4fVx29M
identical!
Conclusion:
The values calculated by the different online tools are all correct but not directly comparable due to different formats and encodings.
A little script as shown above might be a better solution.
I had the same problem until I figured out that I was using plain base64 encoding instead of base64url.
There are also some minor details in between.
Here is the step-by-step manual that will, hopefully, make the whole process much more clear.
Notes
Note 1: You must remove all spaces and newlines from your JSON strings (header and payload).
It is implicitly done on jwt.io when you generate a JWT token.
Note 2: To convert JSON string to base64url string on cryptii.com create the following configuration:
First view: Text
Second view: Encode
Encoding: Base64
Variant: Standard 'base64url' (RFC 4648 §5)
Third view: Text
Note 3: To convert HMAC HEX code (signature) to base64url string on cryptii.com create the following configuration:
First view: Bytes
Format: Hexadecimal
Group by: None
Second view: Encode
Encoding: Base64
Variant: Standard 'base64url' (RFC 4648 §5)
Third view: Text
Manual
You are going to need only two online tools:
[Tool 1]: cryptii.com - for base64url encoding,
[Tool 2]: codebeautify.org - for HMAC calculation.
On cryptii.com you can do both base64url encoding/decoding and also HMAC calculation, but for HMAC you need to provide a HEX key which is different from the input on jwt.io, so I used a separate service for HMAC calculation.
Input data
In this manual I used the following data:
Header:
{"alg":"HS256","typ":"JWT"}
Payload:
{"sub":"1234567890","name":"John Doe","iat":1516239022}
Secret (key):
The Earth is flat!
The secret is not base64 encoded.
Step 1: Convert header [Tool 1]
Header (plain text):
{"alg":"HS256","typ":"JWT"}
Header (base64url encoded):
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
Step 2: Convert payload [Tool 1]
Payload (plain text):
{"sub":"1234567890","name":"John Doe","iat":1516239022}
Payload (base64url encoded):
eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ
Step 3: Calculate HMAC code (signature) [Tool 2]
Calculate HMAC using SHA256 algorithm.
Input string (base64url encoded header and payload, concatenated with a dot):
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ
Calculated code (HEX number):
c8a9ae59f3d64564364a864d22490cc666c74c66a3822be04a9a9287a707b352
The calculated HMAC code is a HEX representation of the signature.
Note: it should not be encoded to base64url as a plain text string but as a sequence of bytes.
Step 4: Encode calculated HMAC code to base64url [Tool 1]:
Signature (Bytes):
c8a9ae59f3d64564364a864d22490cc666c74c66a3822be04a9a9287a707b352
Signature (base64url encoded):
yKmuWfPWRWQ2SoZNIkkMxmbHTGajgivgSpqSh6cHs1I
Summary
Here are our results (all base64url encoded):
Header:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
Payload:
eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ
Signature:
yKmuWfPWRWQ2SoZNIkkMxmbHTGajgivgSpqSh6cHs1I
The results from jwt.io:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.yKmuWfPWRWQ2SoZNIkkMxmbHTGajgivgSpqSh6cHs1I
As you can see, all three parts are identical.

How can I reverse engineer the encode method used here?

I have a string:
RP581147238IN which gets encoded as A3294Fc0Mb0V1Tb4aBK8rw==
and another string:
RP581147239IN which gets encoded as A3294Fc0Mb1BPqxRDrRXjQ==
But after spending a day, I still cannot figure out what is the encoding process.
The encoded string looks like its base64 encoded.
But when I decode it, it looks like:
base64.decodestring("A3294Fc0Mb0V1Tb4aBK8rw==")
\x03}\xbd\xe0W41\xbdA>\xacQ\x0e\xb4W\x8d
The base 64 decoded string now is looking like a zlib compressed string
I've tried to further use zlib decompression methods but none of them worked.
import zlib, base64
rt = 'A3294Fc0Mb1BPqxRDrRXjQ=='
for i in range(-50, 50):
try:
print(zlib.decompress(base64.decodestring(rt), i));
print("{} worked".format(i))
break
except:
pass
But that did not produce any results either.
Can anybody figure out what is the encoding process used here. #Nirlzr, I am looking at you for the heroic answer you provided in Reverse Engineer HTTP request.
The strings seem to be Base64 encoded and the underlying decoded data seems to be encrypted. Encrypted data can not be directly represented as a string and it is common the Base64 encode encrypted data when a string is required.
If this is the case you need to decrypt the decoded data and ignorer to accomplish that you would need the encryption key.
Note: In general it is not productive to compress such short items.
If you put your data strings side by side:
RP581147238IN A3294Fc0Mb0V1Tb4aBK8rw==
RP581147239IN A3294Fc0Mb1BPqxRDrRXjQ==
You can see that source strings have only character difference, but encoded version contains 12 different characters:
----------8-- ----------0V1Tb4aBK8rw--
----------9-- ----------1BPqxRDrRXjQ--
Encoded data has similar paddings at the end as base64, but definitely it is not base64. Probably crypted with some SHA-like algorithm. With the data you provided, I would say that it is not possible to reverse-engineer the encoding process. Probably more data would not help much either.