I'm using the Grails rest plugin, and having issues with parameters containing an ampersand. Here is an example of my query:
def query = [
method: 'artist.getinfo',
artist: 'Matt & Kim',
format: 'json'
]
withRest(uri:'http://ws.audioscrobbler.com/') {
def resp = get(path: '/2.0/', query: query)
}
I think that the get method should automatically URL encode the parameters in query - it correctly converts spaces to '+'. However, it leaves the ampersand as is, which is incorrect (it should be encoded to %26).
I tried manually encoding the artist name before calling get, but then the rest plugin encodes the percent sign!
I turned on logging for the rest client, so I can see what URLs it's requesting.
Originally: http://ws.audioscrobbler.com/2.0/?method=artist.getinfo&artist=Matt+&+Kim&format=json
If I manually encode the name: http://ws.audioscrobbler.com/2.0/?method=artist.getinfo&artist=Matt+%2526+Kim&format=json
Do I need to set an encoding type? (the last.fm API specifies UTF-8) Is this a bug?
As of version 0.7, the rest plugin is using a version of HTTPBuilder which has issues encoding (and decoding) the ampersand character.
There is a JIRA Issue about this with a suggested workaround (upgrading HTTPBuilder to >= 0.5.2)
Related
I'm making requests to Twitter, using the OAuth1.0 signing process to set the Authorization header. They explain it step-by-step here, which I've followed. It all works, most of the time.
Authorization fails whenever special characters are sent without percent encoding in the query component of the request. For example, ?status=hello%20world! fails, but ?status=hello%20world%21 succeeds. But the change from ! to the percent encoded form %21 is only made in the URL, after the signature is generated.
So I'm confused as to why this fails, because AFAIK that's a legally encoded query string. Only the raw strings ("status", "hello world!") are used for signature generation, and I'd assume the server would remove any percent encoding from the query params and generate its own signature for comparison.
When it comes to building the URL, I let URLComponents do the work, so I don't add percent encoding manually, ex.
var urlComps = URLComponents()
urlComps.scheme = "https"
urlComps.host = host
urlComps.path = path
urlComps.queryItems = [URLQueryItem(key: "status", value: "hello world!")]
urlComps.percentEncodedQuery // "status=hello%20world!"
I wanted to see how Postman handled the same request. I selected OAuth1.0 as the Auth type and plugged in the same credentials. The request succeeded. I checked the Postman console and saw ?status=hello%20world%21; it was percent encoding the !. I updated Postman, because a nice little prompt asked me to. Then I tried the same request; now it was getting an authorization failure, and I saw ?status=hello%20world! in the console; the ! was no longer being percent encoded.
I'm wondering who is at fault here. Perhaps Postman and I are making the same mistake. Perhaps it's with Twitter. Or perhaps there's some proxy along the way that idk, double encodes my !.
The OAuth1.0 spec says this, which I believe is in the context of both client (taking a request that's ready to go and signing it before it's sent), and server (for generating another signature to compare against the one received):
The parameters from the following sources are collected into a
single list of name/value pairs:
The query component of the HTTP request URI as defined by
[RFC3986], Section 3.4. The query component is parsed into a list
of name/value pairs by treating it as an
"application/x-www-form-urlencoded" string, separating the names
and values and decoding them as defined by
[W3C.REC-html40-19980424], Section 17.13.4.
That last reference, here, outlines the encoding for application/x-www-form-urlencoded, and says that space characters should be replaced with +, non-alphanumeric characters should be percent encoded, name separated from value by =, and pairs separated by &.
So, the OAuth1.0 spec says that the query string of the URL needs to be decoded as defined by application/x-www-form-urlencoded. Does that mean that our query string needs to be encoded this way too?
It seems to me, if a request is to be signed using OAuth1.0, the query component of the URL that gets sent must be encoded in a way that is different to what it would normally be encoded in? That's a pretty significant detail if you ask me. And I haven't seen it explicitly mentioned, even in Twitter's documentation. And evidently the folks at Postman overlooked it too? Unless I'm not supposed to be using URLComponents to build a URL, but that's what it's for, no? Have I understood this correctly?
Note: ?status=hello+world%21 succeeds; it tweets "hello world!"
I ran into a similar issue.
put the status in post body, not query string.
Percent-encoding:
private encode(str: string) {
// encodeURIComponent() escapes all characters except: A-Z a-z 0-9 - _ . ! ~ * " ( )
// RFC 3986 section 2.3 Unreserved Characters (January 2005): A-Z a-z 0-9 - _ . ~
return encodeURIComponent(str)
.replace(/[!'()*]/g, c => "%" + c.charCodeAt(0).toString(16).toUpperCase());
}
We got a get request that sends string characters in url, so we use path variables to receive them. Apparently there is no way that the calling service would change its method of calling backend so we need to be able to accept a url with the following unencoded characters:
When percentage sign % is sent a http 400 is returned. It does go through if the two characters following % make up an UTF-encoded character
Backslash is converted into a forward slash. I need it to stay backslash.
I'm guessing these might be Tomcat or servlet configuration issues.
(spring boot version 1.5.14.RELEASE)
Percent signs (%) should be no problem if you properly URL encode them (%25). However, slashes and backslashes will not work with Tomcat, even if you encode them (%2F and %5C).
You could set the following properties when running the application:
-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true
However, this won't fix the issue, because in this case, those encoded slashes will be recognized as real ones. So, let's say you have the following controller:
#ResponseBody
#RequestMapping("/api/{foo}")
public String getFoo(#PathVariable String foo) {
return foo;
}
Well, then if you call /api/test%5Ctest, it won't be able to find the correct path. A solution to this problem is to use wildcard matchers and to parse the URL itself from the incoming HttpServletRequest:
#RequestMapping("/api/**")
public String getFoo(HttpServletRequest request) {
// ...
}
Another solution is to use a completely different web container. For example, when using Jetty, this isn't a problem at all, and URL encoded slashes and backslashes will both work.
Spring 5 now blocks encoded percent signs by default. To enable them, create a new Bean that calls setAllowUrlEncodedPercent()
#Bean
public HttpFirewall allowEncodedParamsFirewall() {
StrictHttpFirewall firewall = new StrictHttpFirewall();
firewall.setAllowUrlEncodedPercent(true);
return firewall;
}
There are similar method-calls for forward- and backwards-slash
What you are experiencing is not specific to Spring Boot. Instead, it's a restriction of HTTP.
The HTTP standard requires that any URL containing the percent characters must be decoded by the web server (cf page 36):
If the Request-URI is encoded using the "% HEX HEX" encoding [42], the
origin server MUST decode the Request-URI in order to properly
interpret the request.
As a result, it's not possible to escape the slash character reliably.
Therefore, when the slash is used in a URL – with or without encoding – it will be treated as a path separator. So it cannot be used in a Spring Boot path variable. Similar problem exist for the percent sign and backslash.
Your best options are to use query parameters or a POST request.
In the following URL, the value test_with_/and_% is transmitted:
https://host/abc/def?text=test_with_%2F_and%25
final String path =
request.getAttribute(HandlerMapping.PATH_WITHIN_HANDLER_MAPPING_ATTRIBUTE).toString();
final String bestMatchingPattern =
request.getAttribute(HandlerMapping.BEST_MATCHING_PATTERN_ATTRIBUTE).toString();
String arguments = new AntPathMatcher().extractPathWithinPattern(bestMatchingPattern, path);
if (null != arguments && !arguments.isEmpty()) {
pattern = pattern + '/' + arguments;
}
I also faced similar problem and I have used this so hope this might help
I'm using Play Framework 2.3 and the WS API to download and parse HTML pages. For none-English pages (e.g Russian, Hebrew), I often get wrong encoding.
Here's an example:
def test = Action.async { request =>
WS.url("http://news.walla.co.il/item/2793388").get.map { response =>
Ok(response.body)
}
}
This returns the web page's HTML. English characters are received ok. The Hebrew letters appear as Gibberish. (Not just when rendering, at the internal String level). Like so:
<title>29 ×ר×××× ××פ××ת ×ש×××× ×× ×¤××, ××× ×©×××©× ×שר×××× - ×××××! ××ש×ת</title>
Other articles from the same web-site can appear ok.
using cURL with the same web-page returns perfectly fine which makes me believe the problem is within the WS API.
Any ideas?
Edit:
I found a solution in this SO question.
Parsing the response as ISO-8859-1 and then converting it to UTF-8 like-so:
Ok(new String(response.body.getBytes("ISO-8859-1") , response.header(CONTENT_ENCODING).getOrElse("UTF-8")))
display correctly. So I have a working solution, but why isn't this done internally?
Ok, here the solution I ended up using in production:
def responseBody = response.header(CONTENT_TYPE).filter(_.toLowerCase.contains("charset")).fold(new String(response.body.getBytes("ISO-8859-1") , "UTF-8"))(_ => response.body)
Explanation:
If the request returns a "Content-Type" header that also specifies a charset, simply return the response body sine the WS API will use it to decode correctly, otherwise, assume the response is ISO-8859-1 encoded and convert it to UTF-8
I am using bulbs and rexster and am trying to store nodes with unicode properties (see example below).
Apparently, creating nodes in the graph works properly as I can see the nodes in the web interface that comes with rexster (Rexster Dog House) but retrieving the same node does not work - all I get is None.
Everything works as expected when I create and look for nodes with non-unicode-specific letters in their properties.
E.g. in the following example a node with name = u'University of Cambridge' would be retrievable as expected.
Rexster version:
[INFO] Application - Rexster version [2.4.0]
Example code:
# -*- coding: utf-8 -*-
from bulbs.rexster import Graph
from bulbs.model import Node
from bulbs.property import String
from bulbs.config import DEBUG
import bulbs
class University(Node):
element_type = 'university'
name = String(nullable=False, indexed=True)
g = Graph()
g.add_proxy('university', University)
g.config.set_logger(DEBUG)
name = u'Université de Montréal'
g.university.create(name=name)
print g.university.index.lookup(name=name)
print bulbs.__version__
Gives the following output on the command line:
POST url: http://localhost:8182/graphs/emptygraph/tp/gremlin
POST body: {"params": {"keys": null, "index_name": "university", "data": {"element_type": "university", "name": "Universit\u00e9 de Montr\u00e9al"}}, "script": "def createIndexedVertex = {\n vertex = g.addVertex()\n index = g.idx(index_name)\n for (entry in data.entrySet()) {\n if (entry.value == null) continue;\n vertex.setProperty(entry.key,entry.value)\n if (keys == null || keys.contains(entry.key))\n\tindex.put(entry.key,String.valueOf(entry.value),vertex)\n }\n return vertex\n }\n def transaction = { final Closure closure ->\n try {\n results = closure();\n g.commit();\n return results; \n } catch (e) {\n g.rollback();\n throw e;\n }\n }\n return transaction(createIndexedVertex);"}
GET url: http://localhost:8182/graphs/emptygraph/indices/university?value=Universit%C3%A9+de+Montr%C3%A9al&key=name
GET body: None
None
0.3
Ok, I finally got to the bottom of this.
Since TinkerGraph uses a HashMap for its index, you can see what's being stored in the index by using Gremlin to return the contents of the map.
Here's what's being stored in the TinkerGraph index using your Bulbs g.university.create(name=name) method above...
$ curl http://localhost:8182/graphs/emptygraph/tp/gremlin?script="g.idx(\"university\").index"
{"results":[{"name":{"Université de Montréal":[{"name":"Université de Montréal","element_type":"university","_id":"0","_type":"vertex"}]},"element_type":{"university":[{"name":"Université de Montréal","element_type":"university","_id":"0","_type":"vertex"}]}}],"success":true,"version":"2.5.0-SNAPSHOT","queryTime":3.732632}
All that looks good -- the encodings look right.
To create and index a vertex like the one above, Bulbs uses a custom Gremlin script via an HTTP POST request with a JSON content type.
Here's the problem...
Rexster's index lookup REST endpoint uses URL query params, and Bulbs encodes URL params as UTF-8 byte strings.
To see how Rexster handles URL query params encoded as UTF-8 byte strings, I executed a Gremlin script via a URL query param that simply returns the encoded string...
$ curl http://localhost:8182/graphs/emptygraph/tp/gremlin?script="'Universit%C3%A9%20de%20Montr%C3%A9al'"
{"results":["Université de Montréal"],"success":true,"version":"2.5.0-SNAPSHOT","queryTime":16.59432}
Egad! That's not right. As you can see, that text is mangled.
In a twist of irony, we have Gremlin returning gremlins, and that's what Rexster is using for the key's value in the index lookup, which as we can see is not what's stored in TinkerGraph's HashMap index.
Here's what's going on...
This is what the unquoted byte string looks like in Bulbs:
>>> name
u'Universit\xe9 de Montr\xe9al'
>>> bulbs.utils.to_bytes(name)
'Universit\xc3\xa9 de Montr\xc3\xa9al'
'\xc3\xa9' is the UTF-8 encoding of the unicode character u'\xe9' (which can also be specified as u'\u00e9').
UTF-8 uses 2 bytes to encode a character, and Jersey/Grizzly 1.x (Rexster's app server) has a bug where it doesn't properly handle 2-byte character encodings like UTF-8.
See http://markmail.org/message/w6ipdpkpmyghdx2p
It looks like this is fixed in Jersey/Grizzly 2.0, but switching Rexster from Jersey/Grizzly 1.x to Jersey/Grizzly 2.x is a big ordeal.
Last year TinkerPop decided to switch to Netty instead, and so for the TinkerPop 3 release this summer, Rexster is in the process of morphing into Gremlin Server, which is based on Netty rather than Grizzly.
Until then, here are few workarounds...
Since Grizzly can't handle 2-byte encodings like UTF-8, client libraries need to encode URL params as 1-byte latin1 encodings (AKA ISO-8859-1), which is Grizzly's default encoding.
Here's the same value encoded as a latin1 byte string...
$ curl http://localhost:8182/graphs/emptygraph/tp/gremlin?script="'Universit%E9%20de%20Montr%E9al'"
{"results":["Université de Montréal"],"success":true,"version":"2.5.0-SNAPSHOT","queryTime":17.765313}
As you can see, using a latin1 encoding works in this case.
However, for general purposes, it's probably best for client libraries to use a custom Gremlin script via an HTTP POST request with a JSON content type and thus avoid the URL param encoding issue all together -- this is what Bulbs is going to do, and I'll push the Bulbs update to GitHub later today.
UPDATE: It turns out that even though we cannot change Grizzly's default encoding type, we can specify UTF-8 as the charset in the HTTP request Content-Type header and Grizzly will use it. Bulbs 0.3.29 has been updated to include the UTF-8 charset in its request header, and all tests pass. The update has been pushed to both GitHub and PyPi.
I am new to REST and jersey. I wrote a simple RESTful Web Service using Jersey 1.17 API. The Web service accepts data through POST method. When I pass data having non-ascii characters, it does not read them correctly.
#POST
#Path("hello")
#Consumes(MediaType.APPLICATION_FORM_URLENCODED + ";charset=UTF-8")
public Response hello(#FormParam("message") String message) {
System.out.println(message);
return Response.status(200).entity("hello" + message).build();
}
When I pass data having non-ascii characters in parameter 'message' it does not print it correctly.
curl --data "message=A função, Ãugent" http://localhost:8080/search/hello/
POST method prints "A fun??o, ?ugent"
I do not think Jersey is caring about the charset that is defined at #Consumes. I guess Jersey simply uses the request.getParameter method that uses the encoding of the request to resolve parameters.
You have many options to set the encoding:
In case the servlet container supports, set the default encoding of the connector
Set the default encoding of the jvm to UTF8
Create a Servlet Filter that catches this call and call request.setCharacterEncoding("UTF8"); In this case you must ensure that setCharacterEncoding is called before any other getter function (like getParameter) as the character encoding is set during the first get call on the request.
Do a transform on the parameter value by hand. You can get the ServletRequest and query the encoding. After that you can say:
new String(message.getBytes(currentEncoding), "UTF8");
In your case I would prefer the third one.