I have problem with ipn from paypal. I'm sending them data utf-8 encoded(e.g. "Naročilo št" as item name) and when somebody pays they send me response which is malformed: "Naro�~D�~Milo �~E¡t."(response claims that is UTF-8 encoded) and then when I try to validate that payment I get that it's invalid.
I've tried to change the "buy button" encoding on my paypal profile but it's not working(I still get response with wrongly encoded characters). Anybody have an idea how to fix that issue? And I would rather avoid transforming item names to plain ascii or something similar.
Solution by OP.
According to this
When the Servlet container receives the request, it always pass the request parameters to your program decoded in ISO-8859-1 encoding. (e.g. Browser encoded in UTF-8 but container decoded in ISO-8859-1.) Thus your servlet or JSP will always receive garbage for characters other than ISO-8859-1 encoding.
so the solution to my problem was acquiring request parameters that way:
String value = request.getParameter("mytext");
try{
value = new String(value.getBytes("8859_1"), "UTF-8");
}catch(java.io.UnsupportedEncodingException e){
System.err.println(e);
}
Related
I'm making requests to Twitter, using the OAuth1.0 signing process to set the Authorization header. They explain it step-by-step here, which I've followed. It all works, most of the time.
Authorization fails whenever special characters are sent without percent encoding in the query component of the request. For example, ?status=hello%20world! fails, but ?status=hello%20world%21 succeeds. But the change from ! to the percent encoded form %21 is only made in the URL, after the signature is generated.
So I'm confused as to why this fails, because AFAIK that's a legally encoded query string. Only the raw strings ("status", "hello world!") are used for signature generation, and I'd assume the server would remove any percent encoding from the query params and generate its own signature for comparison.
When it comes to building the URL, I let URLComponents do the work, so I don't add percent encoding manually, ex.
var urlComps = URLComponents()
urlComps.scheme = "https"
urlComps.host = host
urlComps.path = path
urlComps.queryItems = [URLQueryItem(key: "status", value: "hello world!")]
urlComps.percentEncodedQuery // "status=hello%20world!"
I wanted to see how Postman handled the same request. I selected OAuth1.0 as the Auth type and plugged in the same credentials. The request succeeded. I checked the Postman console and saw ?status=hello%20world%21; it was percent encoding the !. I updated Postman, because a nice little prompt asked me to. Then I tried the same request; now it was getting an authorization failure, and I saw ?status=hello%20world! in the console; the ! was no longer being percent encoded.
I'm wondering who is at fault here. Perhaps Postman and I are making the same mistake. Perhaps it's with Twitter. Or perhaps there's some proxy along the way that idk, double encodes my !.
The OAuth1.0 spec says this, which I believe is in the context of both client (taking a request that's ready to go and signing it before it's sent), and server (for generating another signature to compare against the one received):
The parameters from the following sources are collected into a
single list of name/value pairs:
The query component of the HTTP request URI as defined by
[RFC3986], Section 3.4. The query component is parsed into a list
of name/value pairs by treating it as an
"application/x-www-form-urlencoded" string, separating the names
and values and decoding them as defined by
[W3C.REC-html40-19980424], Section 17.13.4.
That last reference, here, outlines the encoding for application/x-www-form-urlencoded, and says that space characters should be replaced with +, non-alphanumeric characters should be percent encoded, name separated from value by =, and pairs separated by &.
So, the OAuth1.0 spec says that the query string of the URL needs to be decoded as defined by application/x-www-form-urlencoded. Does that mean that our query string needs to be encoded this way too?
It seems to me, if a request is to be signed using OAuth1.0, the query component of the URL that gets sent must be encoded in a way that is different to what it would normally be encoded in? That's a pretty significant detail if you ask me. And I haven't seen it explicitly mentioned, even in Twitter's documentation. And evidently the folks at Postman overlooked it too? Unless I'm not supposed to be using URLComponents to build a URL, but that's what it's for, no? Have I understood this correctly?
Note: ?status=hello+world%21 succeeds; it tweets "hello world!"
I ran into a similar issue.
put the status in post body, not query string.
Percent-encoding:
private encode(str: string) {
// encodeURIComponent() escapes all characters except: A-Z a-z 0-9 - _ . ! ~ * " ( )
// RFC 3986 section 2.3 Unreserved Characters (January 2005): A-Z a-z 0-9 - _ . ~
return encodeURIComponent(str)
.replace(/[!'()*]/g, c => "%" + c.charCodeAt(0).toString(16).toUpperCase());
}
I'm trying to pass a complete URL as a parameter to a java-based REST service (GET), but I'm not sure how to format it in order to avoid a "HTTP 400 Bad Request". I've tried Base64 encoding, but still get the 400 error. I think part of the problem is that the url contains a question mark, "?", since it seems to be fine if I remove that and pass the url as-is. I'm not sure what is the problem when its encoded.
example url - http://my.site.com/testing-service?some+parms
method annotations:
#GET
#Path("/{fullurl}")
#Produces("application/json")
public Response findByUrl(#PathParam("fullurl") String fullurl)
...
(I've updated the description with a little more detail per the first couple of comments)
Apparently the encoding approach was close, but Base64 (java or commons-code) didn't work for whatever reason (length perhaps?). I found switching to Base32 (commons-code) works for my situation.
I am currently trying to modify a script to use the requests library instead of the urllib2 library. I haven't really used it before and I am looking to do the equivalent of urlopen("http://www.example.org").read(), so I tried the requests.get("http://www.example.org").text function.
This works fine with normal everyday html, however when I fetch from this url (https://gtfsrt.api.translink.com.au/Feed/SEQ) it doesn't seem to work.
So I wrote the below code to print out the responses from the same url using both the requests and urllib2 libraries.
import urllib2
import requests
#urllib2 request
request = urllib2.Request("https://gtfsrt.api.translink.com.au/Feed/SEQ")
result = urllib2.urlopen(request)
#requests request
result2 = requests.get("https://gtfsrt.api.translink.com.au/Feed/SEQ")
print result2.encoding
#urllib2 write to text
open("Output.txt", 'w').close()
text_file = open("Output.txt", "w")
text_file.write(result.read())
text_file.close()
open("Output2.txt", 'w').close()
text_file = open("Output2.txt", "w")
text_file.write(result2.text)
text_file.close()
The openurl().read() works fine but the requests.get().text doesn't work for the given this url. I suspect it has something to do with encoding, but i don't know what. Any thoughts?
Note: The supplied url is a feed in the google protocol buffer format, once I receive the message i give the feed to a google library that interprets it.
Your issue is that you're making the requests module interpret binary content in a response as text.
A response from the requests library has two main way to access the body of the response:
Response.content - will return the response body as a bytestring
Response.text - will decode the response body as text and return unicode
Since protocol buffers are a binary format, you should use result2.content in your code instead of result2.text.
Response.content will return the body of the response as-is, in bytes. For binary content this is exactly what you want. For text content that contains non-ASCII characters this means the content must have been encoded by the server into a bytestring using a particular encoding that is indicated by either a HTTP header or a <meta charset="..." /> tag. In order to make sense of those bytes they therefore need to be decoded after receiving using that charset.
Response.text now is a convenience method that does exactly this for you. It assumes the response body is text, and looks at the response headers to find the encoding, and decodes it for you, returning unicode.
But if your response doesn't contain text, this is the wrong method to use. Binary content doesn't contain characters, because it's not text, so the whole concept of character encoding does not make any sense for binary content - it's only applicable to text composed of characters. (That's also why you're seeing response.encoding == None - it's just bytes, there is no character encoding involved).
See Response Content and Binary Response Content in the requests documentation for more details.
We have a system that sends out regular emails with links in, many of which contain URL encoded parameters such as this:
href="http://www.mydomain.com/login.aspx?returnurl=http%3A%2F%2Fwww.mydomain.com%2Fview.aspx%3Fid%3D1234%26alert%3Dtrue"
You can see that the "returnurl" parameter is encoded. However, it seems that a large number of our users (seemingly hotmail) are receiving the emails with this paramater partly decoded such as:
href="http://www.mydomain.com/login.aspx?returnurl=http://www.mydomain.com/view.aspx?view.aspx%3Fid%3D1234%26alert%3Dtrue"
Why would it decode like this? Why only partly decode?? I therefore have no idea how to deal with it. I thought of base-64 encoding but that base64 strings contain characters that would need decoding too... I thought of double encoding but then I will not know whether to double-decode the parameter or not... Can anyone help? Thanks.
One reason this could be happening is because url rules for encoding are different before and after ? so if mechanism that is doing decoding does it from the 'back' of url and apples query decoding rules until it finds first ? then this could cause problem you are describing...
Not sure how to deal with it though as I understand system that does this inappropriate decoding is outside of your control. I would try to hide the ? in return url query somehow...
I'm trying to make an "IMAP fetch" command to retrieve the message body. I need to pass/use the correct charset, otherwise the response will come with special characters.
How can I make the IMAP request/command to consider the charset I received in the BODYSTRUCTURE result??
The IMAP server will send non-ASCII message body data as an 8-bit "literal", which is essentially an array of bytes. You can't make the FETCH command use a specific charset, as far as I know.
It's up to the IMAP library or your application to decode the byte array into a string representation using the appropriate charset you received in the BODYSTRUCTURE response.