Base64 decoding of MIME email not working (GMail API) - email

I'm using the GMail API to retrieve an email contents. I am getting the following base64 encoded data for the body: http://hastebin.com/ovucoranam.md
But when I run it through a base64 decoder, it either returns an empty string (error) or something that resembles the HTML data but with a bunch of weird characters.
Help?

I'm not sure if you've solved it yet, but GmailGuy is correct. You need to convert the body to the Base64 RFC 4648 standard. The jist is you'll need to replace - with + and _ with /.
I've taken your original input and did the replacement: http://hastebin.com/ukanavudaz
And used base64decode.org to decode it, and it was fine.

You need to use URL (aka "web") safe base64 decoding alphabet (see rfc 4648), which it doesn't appear you're doing. Using the standard base64 alphabet may work sometimes but not always (2 of the characters are different).
Docs don't seem to consistently mention this important detail. Here's one where it does though:
https://developers.google.com/gmail/api/guides/drafts
Also, if your particular library doesn't support the "URL safe" alphabet then you can do string substitution on the string first ("-" with "+" and "_" with "/") and then do normal base64 decoding on it.

I had the same issue decoding the 'data' fields in the message object response from the Gmail API. The Google Ruby API library wasn't decoding the text correctly either. I found I needed to do a url-safe base64 decode:
#data = Base64.urlsafe_decode64(JSON.parse(#result.data.to_json)["payload"]["body"]["data"])
Hope that helps!

There is an example for python 2.x and 3.x:
decodedContents = base64.urlsafe_b64decode(payload["body"]["data"].encode('ASCII'))

If you only need to decode for displaying purposes, consider using atob to decode the messages in JavaScript frontend (see ref).

I found whilst playing with the API result, once I had drilled down to the body I was given an option to decode in the available methods.
val message = mService!!.users().messages().get(user, id).setFormat("full").execute()
println("Message snippet: " + message.snippet)
if(message.payload.mimeType == "text/plain"){
val body = message.payload.body.decodeData() // getValue("body")
Log.i("BODY", body.toString(Charset.defaultCharset()))
}
The result:-
com.example.quickstart I/BODY: ISOLATE NORMAL: 514471,Fap, South Point Rolleston, 55 Faringdon Boulevard , Rolleston, 30 May 2018 20:59:21

I coped the base64 test to a file (b64.txt), then base64-decoded it using base64 (from coreutils) with the -d option (see http://linux.die.net/man/1/base64) and I got text that was perfectly readable. The command I used was:
cat b64.txt | base64 -d

Related

Requests fail authorization when query string contains certain characters

I'm making requests to Twitter, using the OAuth1.0 signing process to set the Authorization header. They explain it step-by-step here, which I've followed. It all works, most of the time.
Authorization fails whenever special characters are sent without percent encoding in the query component of the request. For example, ?status=hello%20world! fails, but ?status=hello%20world%21 succeeds. But the change from ! to the percent encoded form %21 is only made in the URL, after the signature is generated.
So I'm confused as to why this fails, because AFAIK that's a legally encoded query string. Only the raw strings ("status", "hello world!") are used for signature generation, and I'd assume the server would remove any percent encoding from the query params and generate its own signature for comparison.
When it comes to building the URL, I let URLComponents do the work, so I don't add percent encoding manually, ex.
var urlComps = URLComponents()
urlComps.scheme = "https"
urlComps.host = host
urlComps.path = path
urlComps.queryItems = [URLQueryItem(key: "status", value: "hello world!")]
urlComps.percentEncodedQuery // "status=hello%20world!"
I wanted to see how Postman handled the same request. I selected OAuth1.0 as the Auth type and plugged in the same credentials. The request succeeded. I checked the Postman console and saw ?status=hello%20world%21; it was percent encoding the !. I updated Postman, because a nice little prompt asked me to. Then I tried the same request; now it was getting an authorization failure, and I saw ?status=hello%20world! in the console; the ! was no longer being percent encoded.
I'm wondering who is at fault here. Perhaps Postman and I are making the same mistake. Perhaps it's with Twitter. Or perhaps there's some proxy along the way that idk, double encodes my !.
The OAuth1.0 spec says this, which I believe is in the context of both client (taking a request that's ready to go and signing it before it's sent), and server (for generating another signature to compare against the one received):
The parameters from the following sources are collected into a
single list of name/value pairs:
The query component of the HTTP request URI as defined by
[RFC3986], Section 3.4. The query component is parsed into a list
of name/value pairs by treating it as an
"application/x-www-form-urlencoded" string, separating the names
and values and decoding them as defined by
[W3C.REC-html40-19980424], Section 17.13.4.
That last reference, here, outlines the encoding for application/x-www-form-urlencoded, and says that space characters should be replaced with +, non-alphanumeric characters should be percent encoded, name separated from value by =, and pairs separated by &.
So, the OAuth1.0 spec says that the query string of the URL needs to be decoded as defined by application/x-www-form-urlencoded. Does that mean that our query string needs to be encoded this way too?
It seems to me, if a request is to be signed using OAuth1.0, the query component of the URL that gets sent must be encoded in a way that is different to what it would normally be encoded in? That's a pretty significant detail if you ask me. And I haven't seen it explicitly mentioned, even in Twitter's documentation. And evidently the folks at Postman overlooked it too? Unless I'm not supposed to be using URLComponents to build a URL, but that's what it's for, no? Have I understood this correctly?
Note: ?status=hello+world%21 succeeds; it tweets "hello world!"
I ran into a similar issue.
put the status in post body, not query string.
Percent-encoding:
private encode(str: string) {
// encodeURIComponent() escapes all characters except: A-Z a-z 0-9 - _ . ! ~ * " ( )
// RFC 3986 section 2.3 Unreserved Characters (January 2005): A-Z a-z 0-9 - _ . ~
return encodeURIComponent(str)
.replace(/[!'()*]/g, c => "%" + c.charCodeAt(0).toString(16).toUpperCase());
}

Encoding a GPX file such that it's accepted by the /matchroute endpoint of the Here API

I am trying to call the resource /matchroute via a GET request.
However, I can't figure out how to encode the GPX file so that the resource accepts my request: I always receive HTTP error 400 as a response from the Here server.
As exemplary data I used the following file:
<?xml version="1.0"?>
<gpx version="1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.topografix.com/GPX/1/0"
xsi:schemaLocation="http://www.topografix.com/GPX/1/0
http://www.topografix.com/GPX/1/0/gpx.xsd">
<trk>
<trkseg>
<trkpt lat="51.10177" lon="0.39349"/>
<trkpt lat="51.10181" lon="0.39335"/>
<trkpt lat="51.10255" lon="0.39366"/>
<trkpt lat="51.10398" lon="0.39466"/>
<trkpt lat="51.10501" lon="0.39533"/>
</trkseg>
</trk>
</gpx>
that I got from the this example.
I encoded this file using MATLAB's function matlab.net.base64encode which yielded the following base64-encoded string:
PD94bWwgdmVyc2lvbj0iMS4wIj8+PGdweCB2ZXJzaW9uPSIxLjAieG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8y
MDAxL1hNTFNjaGVtYS1pbnN0YW5jZSJ4bWxucz0iaHR0cDovL3d3dy50b3BvZ3JhZml4LmNvbS9HUFgvMS8wInhzaTpz
Y2hlbWFMb2NhdGlvbj0iaHR0cDovL3d3dy50b3BvZ3JhZml4LmNvbS9HUFgvMS8wIGh0dHA6Ly93d3cudG9wb2dyYWZp
eC5jb20vR1BYLzEvMC9ncHgueHNkIj48dHJrPjx0cmtzZWc+PHRya3B0IGxhdD0iNTEuMTAxNzciIGxvbj0iMC4zOTM0
OSIvPjx0cmtwdCBsYXQ9IjUxLjEwMTgxIiBsb249IjAuMzkzMzUiLz48dHJrcHQgbGF0PSI1MS4xMDI1NSIgbG9uPSIw
LjM5MzY2Ii8+PHRya3B0IGxhdD0iNTEuMTAzOTgiIGxvbj0iMC4zOTQ2NiIvPjx0cmtwdCBsYXQ9IjUxLjEwNTAxIiBs
b249IjAuMzk1MzMiLz48L3Rya3NlZz48L3Ryaz48L2dweD4=
However, as stated before, the HERE server consistently responds with HTTP-error 400 to my request
https://rme.api.here.com/2/matchroute.json?app_id={app_id}&app_code={app_code}&routemode=car&file=...
where "..." equals the above mentioned base64-encoded string.
Question: Could anyone please provide a code sample showing how to encode the above mentioned GPX file correctly (ideally in MATLAB language) so that the /matchroute resource is able to respond?
Remarks:
If I use the base64 string
UEsDBBQAAAAIANmztEQSwaeZzwAAAM8BAAAQAAAAc2FtcGxlLXRyYWNlLmdweIXPTQuCMBwG8HufQnZv%2F605S0k9dj
EIungdZjpSJ27kPn6%2BRBgYXcYYv2cPzzG2deU8805L1YSIYoLiaHMsWvv9uBlYowOrZYhKY9oAoO973DOsugJ2hFBI
z8k1K%2FNabGWjjWiy%2FJ36ShjVqqITd2lxpmo4XVKgMP6vZaCneKIyYabivzHnr4BhCbb6hoZRpnvMp86L%2BdIapx
ImRJxiSuh%2Bj5xq7CWY%2Bcz1EaypA10qxlfVjvOl8rxVxfzDQrk%2FFCfLRs7YpOCzA%2BZd49LoBVBLAQIUABQAAA
AIANmztEQSwaeZzwAAAM8BAAAQAAAAAAAAAAEAIAAAAAAAAABzYW1wbGUtdHJhY2UuZ3B4UEsFBgAAAAABAAEAPgAAAP
0AAAAAAA%3D%3D
from this example the GET request works. However, I couldn't figure out how to reproduce this encoding myself so that I am able to encode my own data accordingly.
Link to the Here API definition: https://developer.here.com/documentation/route-match/topics/resource-matchroute-request.html
Looking at the two base64 strings I can tell you the fundamental difference between them - the first one (which doesn't work) is unescaped whereas the second one (which works) is.
You can convert between the two formats manually using various online tools like this one. The escaped version of the non-working base64 string, in case you want to test it, is:
PD94bWwgdmVyc2lvbj0iMS4wIj8+PGdweCB2ZXJzaW9uPSIxLjAieG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8y
%0AMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSJ4bWxucz0iaHR0cDovL3d3dy50b3BvZ3JhZml4LmNvbS9HUFgvMS8wInhza
Tpz%0AY2hlbWFMb2NhdGlvbj0iaHR0cDovL3d3dy50b3BvZ3JhZml4LmNvbS9HUFgvMS8wIGh0dHA6Ly93d3cudG9wb2
dyYWZp%0AeC5jb20vR1BYLzEvMC9ncHgueHNkIj48dHJrPjx0cmtzZWc+PHRya3B0IGxhdD0iNTEuMTAxNzciIGxvbj0
iMC4zOTM0%0AOSIvPjx0cmtwdCBsYXQ9IjUxLjEwMTgxIiBsb249IjAuMzkzMzUiLz48dHJrcHQgbGF0PSI1MS4xMDI1
NSIgbG9uPSIw%0ALjM5MzY2Ii8+PHRya3B0IGxhdD0iNTEuMTAzOTgiIGxvbj0iMC4zOTQ2NiIvPjx0cmtwdCBsYXQ9I
jUxLjEwNTAxIiBs%0Ab249IjAuMzk1MzMiLz48L3Rya3NlZz48L3Ryaz48L2dweD4%3D
I'm not an expert on this, but as I understand, you need to URL-encode strings only when you want to paste them as-is into the web path of your browser (read about "URL Params"). If you construct your HTTP requests the right way™ (by this I mean specify the headers of the request and the key-value pairs correctly), you shouldn't have to worry about URL-encoding at all, since the tool that you're using (in this case, MATLAB) should take care of the conversion for you.
Unfortunately, I cannot test this theory, as I have no access to the discussed API - but I am fairly certain that this would solve your problem.
I had the exact same problem.
The documentation seems to be incomplete. You can check here for additional information. Several ways I solved this:
Use filetype='CSV' or filtetype='GPX' in parameter. It says the filetype is guessed if passed, that is actually not true. After passing an XML file the API told me my file didn't look like a 'CSV'
Compression is OPTIONAL, I suggest to avoid it completely I could not find a suitable compression either. It works fine with plain base64 encoding.
I suggest to actually use CSV because the XML actually returns parsing errors.
In python
data='''latitude,longitude
51.10177,0.39349
'''
r = requests.get('https://rme.api.here.com/2/matchroute.json?app_id={APP_ID}&app_code={APP_CODE}&routemode=car&file={file}&filetype={filetype}'.format(
APP_ID=os.getenv('HERE_APP_ID'),
APP_CODE=os.getenv('HERE_APP_CODE'),
filetype='CSV',
file=base64.b64encode(data.encode()).decode()
))

How can I reverse engineer the encode method used here?

I have a string:
RP581147238IN which gets encoded as A3294Fc0Mb0V1Tb4aBK8rw==
and another string:
RP581147239IN which gets encoded as A3294Fc0Mb1BPqxRDrRXjQ==
But after spending a day, I still cannot figure out what is the encoding process.
The encoded string looks like its base64 encoded.
But when I decode it, it looks like:
base64.decodestring("A3294Fc0Mb0V1Tb4aBK8rw==")
\x03}\xbd\xe0W41\xbdA>\xacQ\x0e\xb4W\x8d
The base 64 decoded string now is looking like a zlib compressed string
I've tried to further use zlib decompression methods but none of them worked.
import zlib, base64
rt = 'A3294Fc0Mb1BPqxRDrRXjQ=='
for i in range(-50, 50):
try:
print(zlib.decompress(base64.decodestring(rt), i));
print("{} worked".format(i))
break
except:
pass
But that did not produce any results either.
Can anybody figure out what is the encoding process used here. #Nirlzr, I am looking at you for the heroic answer you provided in Reverse Engineer HTTP request.
The strings seem to be Base64 encoded and the underlying decoded data seems to be encrypted. Encrypted data can not be directly represented as a string and it is common the Base64 encode encrypted data when a string is required.
If this is the case you need to decrypt the decoded data and ignorer to accomplish that you would need the encryption key.
Note: In general it is not productive to compress such short items.
If you put your data strings side by side:
RP581147238IN A3294Fc0Mb0V1Tb4aBK8rw==
RP581147239IN A3294Fc0Mb1BPqxRDrRXjQ==
You can see that source strings have only character difference, but encoded version contains 12 different characters:
----------8-- ----------0V1Tb4aBK8rw--
----------9-- ----------1BPqxRDrRXjQ--
Encoded data has similar paddings at the end as base64, but definitely it is not base64. Probably crypted with some SHA-like algorithm. With the data you provided, I would say that it is not possible to reverse-engineer the encoding process. Probably more data would not help much either.

Need to find the requests equivalent of openurl() from urllib2

I am currently trying to modify a script to use the requests library instead of the urllib2 library. I haven't really used it before and I am looking to do the equivalent of urlopen("http://www.example.org").read(), so I tried the requests.get("http://www.example.org").text function.
This works fine with normal everyday html, however when I fetch from this url (https://gtfsrt.api.translink.com.au/Feed/SEQ) it doesn't seem to work.
So I wrote the below code to print out the responses from the same url using both the requests and urllib2 libraries.
import urllib2
import requests
#urllib2 request
request = urllib2.Request("https://gtfsrt.api.translink.com.au/Feed/SEQ")
result = urllib2.urlopen(request)
#requests request
result2 = requests.get("https://gtfsrt.api.translink.com.au/Feed/SEQ")
print result2.encoding
#urllib2 write to text
open("Output.txt", 'w').close()
text_file = open("Output.txt", "w")
text_file.write(result.read())
text_file.close()
open("Output2.txt", 'w').close()
text_file = open("Output2.txt", "w")
text_file.write(result2.text)
text_file.close()
The openurl().read() works fine but the requests.get().text doesn't work for the given this url. I suspect it has something to do with encoding, but i don't know what. Any thoughts?
Note: The supplied url is a feed in the google protocol buffer format, once I receive the message i give the feed to a google library that interprets it.
Your issue is that you're making the requests module interpret binary content in a response as text.
A response from the requests library has two main way to access the body of the response:
Response.content - will return the response body as a bytestring
Response.text - will decode the response body as text and return unicode
Since protocol buffers are a binary format, you should use result2.content in your code instead of result2.text.
Response.content will return the body of the response as-is, in bytes. For binary content this is exactly what you want. For text content that contains non-ASCII characters this means the content must have been encoded by the server into a bytestring using a particular encoding that is indicated by either a HTTP header or a <meta charset="..." /> tag. In order to make sense of those bytes they therefore need to be decoded after receiving using that charset.
Response.text now is a convenience method that does exactly this for you. It assumes the response body is text, and looks at the response headers to find the encoding, and decodes it for you, returning unicode.
But if your response doesn't contain text, this is the wrong method to use. Binary content doesn't contain characters, because it's not text, so the whole concept of character encoding does not make any sense for binary content - it's only applicable to text composed of characters. (That's also why you're seeing response.encoding == None - it's just bytes, there is no character encoding involved).
See Response Content and Binary Response Content in the requests documentation for more details.

MVC HttpUtility.UrlEncode

i am attempting to use HttpUtility.UrlEncode to encode strings that ultimately are used in URLs.
example
/string/http://www.google.com
or
/string/my test string
where http://www.google.com is a parameter passed to a controller.
I have tried UrlEncode but it doesn't seem to work quite right
my route looks like:
routes.MapRoute(
"mStringView",
"mString/{sText}",
new { controller = "mString", action = "Index", sText = UrlParameter.Optional }
);
The problem is the encoded bits are decoded it seems somewhere in the routing.. except things like "+" which replace " " are not decoded..
Understanding my case, where a UrlParameter can be any string, including URL's.. what is the best way to encode them before pushing them into my db, and then handling the decode knowing they will be passed to a controller as a parameter?
thanks!
It seems this problem has come up in other forums and the general recommendation is to not rely on standard url encoding for asp.net mvc. The advantage is url encoding is not necessarily as user friendly as we want, which is one of the goals of custom routed urls. For example, this:
http://server.com/products/Goods+%26+Services
can be friendlier written as
http://server.com/products/Good-and-Services
So custom url encoding has advantages beyond working around this quirk/bug. More details and examples here:
http://www.dominicpettifer.co.uk/Blog/34/asp-net-mvc-and-clean-seo-friendly-urls
You could convert the parameter to byte array and use the HttpServerUtility.UrlTokenEncode
If the problem is that the "+" doesn't get decoded, use HttpUtility.UrlPathEncode to encode and the decoding will work as desired.
From the documentation of HttpUtility.UrlEncode:
You can encode a URL using with the UrlEncode method or the
UrlPathEncode method. However, the methods return different results.
The UrlEncode method converts each space character to a plus character
(+). The UrlPathEncode method converts each space character into the
string "%20", which represents a space in hexadecimal notation. Use
the UrlPathEncode method when you encode the path portion of a URL in
order to guarantee a consistent decoded URL, regardless of which
platform or browser performs the decoding.