Should SAMLResponse contain extra line breaks? - saml

I have been working on a solution that retrieves SAMLResponse from third-party IdP and we simply decode that SAMLResposne with jdk Base64 decoder,
However one of the cases is where we get SAMLResponse with line breaks (\n) after some characters and when we try to decode it with,
...
byte[] base64DecodedResponse = Base64.getDecoder().decode(authnResponse);
...
This authnResposne is SAMLResponse from HTTP header which has \n new line, this failed to parse in above code.
I have been looking for a confirmation whether any SAMLResponse received by SPs must be in Base64 encoded format hence should never contain line breaks or it can be and SP should handle it.
Applying fix from SP side is simple, simply .replaceAll("\n","") will do the job, but is it really industry standard to EDIT the SAMLResponse?

You probably need to use Base64.Decoder getUrlDecoder()
SAML2 is supposed to be encoded in base64url - basically Base 64 Encoding with URL and Filename Safe Alphabet https://www.rfc-editor.org/rfc/rfc4648#section-5.
getUrlDecoder also should reject embedded newlines in the base64 so it may not do you any good.
I would be interested in knowing which SAML provider you are using.

For those who looking for wisdom here,
Editing SAMLResponse after it's signed is bad practice.
According to SAML documentation, SAMLResponse encoding can have either BASE 64 Content-Transfer encoding RFC-2045 or Base64 Encoding RFC-4648.
From SAML 2.0 core-doc, section 5
Profiles MAY specify alternative signature mechanisms such as S/MIME or signed Java objects that contain SAML documents. Caveats about retaining context and interoperability apply
This leads to the justification that SPs should be able to decode Standard and MiMe decoding, hence a try and catch block with Base64.getMimeDecoder() to get around this issue.

Related

Are CSRF tokens base64url encoded?

I'm working on an application that parses a CSRF token from a cookie header. I'd like to know whether CSRF tokens are base64 encoded with URL-safe characters (cf. https://simplycalc.com/base64url-encode.php) so that I can match them with the regular expression
[.a-zA-Z0-9_-]+
I was able to find documentation on JSON web tokens (JWTs) indicating that they consist of base64url-encoded portions separated by periods ('.'), but I wasn't able to find similar documentation on CSRF tokens.
Are CSRF tokens also generally limited to a certain character set, or can they contain any characters?
A CSRF token is an opaque "nonce" that doesn't contain any info -- the token in the form submission and the token in the cookie or header simply have to match is all. If you see them base64-encoded, it's just for convenience of transmission, but it won't decode to anything useful, just random bytes most of the time. Nothing like the JSON structure of JWT.
Looking at my current framework (Laravel), its CSRF tokens are just random strings (they're derived from base64, but not valid base64). Chances are that's the case for most other frameworks too.

Why do you use base64 URL encoding with JSON web tokens?

The Scenario:
I'm reading about JSON web tokens at this link (https://medium.com/vandium-software/5-easy-steps-to-understanding-json-web-tokens-jwt-1164c0adfcec). It outline how to create a JSON web token, you create a header and a payload, and then create a signature using the following pseudocode:
data = base64urlEncode( header ) + “.” + base64urlEncode( payload )
hashedData = hash( data, secret )
signature = base64urlEncode( hashedData )
My Question:
Why does the pseudocode use base64urlEncode when creating data and signature?
Scope Of What I Understand So Far:
Base64 allows you to express binary data using text characters from the Base64 set of 64 text characters. This is usually used when you have a set of data that you want to pass through some channel that might misinterpret some of the characters, but would not misinterpret Base64 characters, so you encode it using Base64 so that the data won't get misinterpreted. Base64 URL encoding, on the other hand, is analogous to Base64 encoding except that you use only a subset of the Base64 character set that does not include characters that have special meaning in URLs, so that if you use the Base64 URL encoded string in a URL, its meaning won't get misinterpreted.
Assuming my understanding there is correct, I'm trying to understand why base64urlEncode() is used in computing data and signature in the pseudocode above. Is the signature of a JSON web token going to be used somewhere in a URL? If so, why is data base64urlEncoded as well before hashing. Why not just encode the signature? Is there something about the hash function that would require its data parameter to be Base64 URL encoded?
When using the OAuth Implicit Grant, JWTs may be transferred as part of URL fragments.
That is just an example, but I guess in general it was presumed that JWTs might be passed through URLs, so base64urlEncodeing them makes sense.
The first line of the IETF JWT standard abstract even says:
JSON Web Token (JWT) is a compact, URL-safe means of representing claims to be transferred between two parties.
(Note that the OAuth Implicit Grant is no longer recommended to be used.)

Spring cloud gateway uri decoding failing

I wrote a gateway application using Spring cloud Greenwich binaries. I'm seeing issues when special characters are present in URL. The request fails with below exception in Spring gateway when request URI contains special characters.
localhost:8080/myresource/WG_splchar_%26%5E%26%25%5E%26%23%25%24%5E%26%25%26*%25%2B)!%24%23%24%25%26%5E_new
When I hit above url, Spring fails with below exception. I'm not able to figure out why it's an invalid sequence and how things like these can be handled.
java.lang.IllegalArgumentException: Invalid encoded sequence "%^&#%$^&%&*%+)!$#$%&^_new"
at org.springframework.util.StringUtils.uriDecode(StringUtils.java:741) ~[spring-core-5.1.4.RELEASE.jar:5.1.4.RELEASE]
at org.springframework.http.server.DefaultPathContainer.parsePathSegment(DefaultPathContainer.java:126) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]
at org.springframework.http.server.DefaultPathContainer.createFromUrlPath(DefaultPathContainer.java:111) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]
at org.springframework.http.server.PathContainer.parsePath(PathContainer.java:76) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]
at org.springframework.cloud.gateway.handler.predicate.PathRoutePredicateFactory.lambda$apply$2(PathRoutePredicateFactory.java:79) ~[spring-cloud-gateway-core-2.1.0.RC3.jar:2.1.0.RC3]
at org.springframework.cloud.gateway.support.ServerWebExchangeUtils.lambda$toAsyncPredicate$1(ServerWebExchangeUtils.java:128) ~[spring-cloud-gateway-core-2.1.0.RC3.jar:2.1.0.RC3]
at org.springframework.cloud.gateway.handler.AsyncPredicate.lambda$and$1(AsyncPredicate.java:35) ~[spring-cloud-gateway-core-2.1.0.RC3.jar:2.1.0.RC3]
at org.springframework.cloud.gateway.handler.RoutePredicateHandlerMapping.lambda$null$2(RoutePredicateHandlerMapping.java:112) ~[spring-cloud-gateway-core-2.1.0.RC3.jar:2.1.0.RC3]
at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.onNext(MonoFilterWhen.java:116) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2070) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.onSubscribe(MonoFilterWhen.java:103) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at reactor.core.publisher.MonoFilterWhen.subscribe(MonoFilterWhen.java:56) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
I answered the other question already and don't feel like retyping. The spirit of the answer is the exact same.
Write a unit test exercising this method off of the Spring cloud utils. This is what's breaking. You can try passing in more or less of the string you're concerned about to find where the breakage is. Use a binary search to figure out what's broken. Make sure you don't split the string in the middle of an encoded character or else you'll give yourself a false positive. When it says you have an invalid sequence I would expect you have something like %99 where 99 is does not map to any valid character (I'm just making one up)
https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/util/StringUtils.html#uriDecode-java.lang.String-java.nio.charset.Charset-
As an aside
Where is this encoded string coming from? Did someone at your company create their own solution to encode this string to begin with? Are you accepting user data? It's VERY POSSIBLE that whomever is responsible for producing this string encoded it incorrectly by homerolling their own encoder.
ALTERNATIVELY
spring.cloud.gateway.routes[7].predicates[0]=Path=/test/{testId}/test1/test_%26%5E%26%25%5E%26%25%26*%25%2B)!
When I look at this I see a path that is already encoded. For example, you've taken your ampersand & character and replaced it with %26
Have you tried inputting a path that is NOT already encoded?
For example
spring.cloud.gateway.routes[7].predicates[0]=Path=/test/{testId}/test1/test_&^&%^ < I only partially decoded it by hand using this chart. https://www.w3schools.com/tags/ref_urlencode.asp

Why header and payload in the JWT token always starts with eyJ

I am using JWT token to authorize my APIs, during implementation I found header and payload in token always start with eyJ. What does this indicate?
JWTs consist of base64url encoded JSON, and a JSON structure just starts with {"..., which becomes ey...when encoded with a base64 encoder.
The JWT header starts with {"alg":..., which then becomes eyJ...
You can try on this online encoder and enter {"alg" and click on encode. The result will be eyJhbGciPSA=
I'm afraid the question, and answer above is a little too extensive/certain.
The best you can check for is (only) 'ey', as the first JSON member could be something else such as "typ" (rather than "alg"); I wouldn't recommend assuming the order of JSON members (even if they are supposed to follow a prescribed order - i.e. allowing for the possibility of real-world anomalies/a small amount of flex).
Also, as much as it is probably unlikely - as far as what is been produced by a particular implementation, there could be a (/some) whitespace following the opening (JSON object) brace character (and maybe even before it!) - I'm not sure if the standard/RFCs forbid this, but even if it's only a temporal instance of a bug (within the JWT generation process) this could in-theory occur; so you're better in only checking for 'ey' - as a quick smoke-test, before then proceeding on to a fuller/complete validation of the JWT.
(F.Y.I. I believe it may have been Microsoft 'Identity Platform' whereby "typ" preceded "alg" - if memory serves me correctly (?), but I can't swear to it / as to where I've seen this being the case - at least at one point in time.)

How can I reverse engineer the encode method used here?

I have a string:
RP581147238IN which gets encoded as A3294Fc0Mb0V1Tb4aBK8rw==
and another string:
RP581147239IN which gets encoded as A3294Fc0Mb1BPqxRDrRXjQ==
But after spending a day, I still cannot figure out what is the encoding process.
The encoded string looks like its base64 encoded.
But when I decode it, it looks like:
base64.decodestring("A3294Fc0Mb0V1Tb4aBK8rw==")
\x03}\xbd\xe0W41\xbdA>\xacQ\x0e\xb4W\x8d
The base 64 decoded string now is looking like a zlib compressed string
I've tried to further use zlib decompression methods but none of them worked.
import zlib, base64
rt = 'A3294Fc0Mb1BPqxRDrRXjQ=='
for i in range(-50, 50):
try:
print(zlib.decompress(base64.decodestring(rt), i));
print("{} worked".format(i))
break
except:
pass
But that did not produce any results either.
Can anybody figure out what is the encoding process used here. #Nirlzr, I am looking at you for the heroic answer you provided in Reverse Engineer HTTP request.
The strings seem to be Base64 encoded and the underlying decoded data seems to be encrypted. Encrypted data can not be directly represented as a string and it is common the Base64 encode encrypted data when a string is required.
If this is the case you need to decrypt the decoded data and ignorer to accomplish that you would need the encryption key.
Note: In general it is not productive to compress such short items.
If you put your data strings side by side:
RP581147238IN A3294Fc0Mb0V1Tb4aBK8rw==
RP581147239IN A3294Fc0Mb1BPqxRDrRXjQ==
You can see that source strings have only character difference, but encoded version contains 12 different characters:
----------8-- ----------0V1Tb4aBK8rw--
----------9-- ----------1BPqxRDrRXjQ--
Encoded data has similar paddings at the end as base64, but definitely it is not base64. Probably crypted with some SHA-like algorithm. With the data you provided, I would say that it is not possible to reverse-engineer the encoding process. Probably more data would not help much either.