I know how to encode a string in URL format (the smiley face is intentional):
let str = "www.mywebsite.com/😀.html"
let escapedStr = str.stringByAddingPercentEncodingWithAllowedCharacters(NSCharacterSet.URLPathAllowedCharacterSet())!
print(escapedStr)
// Output:
// www.mywebsite.com/%F0%9F%98%80.html
But if I attach http:// to the unescaped string Swift escapes the colon too:
let str = "http://www.mywebsite.com/😀.html"
let escapedStr = str.stringByAddingPercentEncodingWithAllowedCharacters(NSCharacterSet.URLPathAllowedCharacterSet())!
print(escapedStr)
// Output
// http%3A//www.mywebsite.com/%F0%9F%98%80.html
So short of removing and adding http:// manually, how can I properly escape those strings? There are other prefixes I must handle handle like https://, ftp:// or ssh://
: is not a legal character in the path part of an URL. You percent-encoded everything not in URLPathAllowedCharacterSet, so it shouldn't be surprising that the : was encoded.
Each part of an URL has different encoding rules. iOS can't correctly encode an URL until it knows what goes in what part, and it can't do that from an unencoded string (since it'd have to parse it first, and it can't parse it because it's not correctly encoded yet). In some systems (including older versions of iOS), it would use various heuristics that assumed "well, I guess you probably meant..." rather than actually following the URL-encoding rules. This was convenient common cases, while mis-encoding less common, but legal, cases (especially involving non-http URLs and non-Latin URLs). iOS now follows the rules, so things encode consistently, but it means you need to actually think about URLs and not just throw random stuff at the system and hope it figures it out.
The best way to do this (if you have to compute this stuff dynamically) is with NSURLComponents:
let url = NSURLComponents()
url.scheme = "http"
url.host = "www.mywebsite.com"
url.path = "/😀.html"
url.string // "http://www.mywebsite.com/%F0%9F%98%80.html"
url.percentEncodedPath // "/%F0%9F%98%80.html"
url.URL // http://www.mywebsite.com/%F0%9F%98%80.html
// etc.
See also NSURLComponents.URLReativeToURL if you have some base, static URL that you don't have to worry about dynamically encoding.
let baseURL = NSURL(string: "http://www.mywebsite.com")
let relative = NSURLComponents()
relative.path = "/😀.html"
let url = relative.URLRelativeToURL(baseURL)
url?.absoluteString
You're confusing things – the special characters after the domain name need to be escaped using the "percent encoding" (I don't think that's 100% the correct term), according to the HTTP standard.
The Domain name itself can contain any unicode codepoint (and the client should then apply Punycode to map it to a DNS name), and the URL classifier (http:) must not be escaped.
So, yes, you'll need to handle these parts of your URL differently – no way around that. Other protocols might require other encoding of special characters than HTTP does. For example, the ssh: URL class (which is pretty application specific. SSH as it is just a family of secure transports, not a means to describe a uniform ressource location) will probably have wildly different approaches to non-ASCII characters than HTTP, depending on what you actually mean with ssh: "URLs".
The fastest way to do:
In past you use to escape and encode your string to UTF8 coding the following statement:
let str = "http://www.mywebsite.com/😀.html"
let escapedStr = str.stringByAddingPercentEscapesUsingEncoding(NSUTF8StringEncoding)
Now this code is deprecated so the equivalent in swift 2.2 is:
let str = "http://www.mywebsite.com/😀.html"
let escapedStr = str.stringByAddingPercentEncodingWithAllowedCharacters(.URLQueryAllowedCharacterSet())
It encodes everything after the question mark in the URL string.
Related
I'm making requests to Twitter, using the OAuth1.0 signing process to set the Authorization header. They explain it step-by-step here, which I've followed. It all works, most of the time.
Authorization fails whenever special characters are sent without percent encoding in the query component of the request. For example, ?status=hello%20world! fails, but ?status=hello%20world%21 succeeds. But the change from ! to the percent encoded form %21 is only made in the URL, after the signature is generated.
So I'm confused as to why this fails, because AFAIK that's a legally encoded query string. Only the raw strings ("status", "hello world!") are used for signature generation, and I'd assume the server would remove any percent encoding from the query params and generate its own signature for comparison.
When it comes to building the URL, I let URLComponents do the work, so I don't add percent encoding manually, ex.
var urlComps = URLComponents()
urlComps.scheme = "https"
urlComps.host = host
urlComps.path = path
urlComps.queryItems = [URLQueryItem(key: "status", value: "hello world!")]
urlComps.percentEncodedQuery // "status=hello%20world!"
I wanted to see how Postman handled the same request. I selected OAuth1.0 as the Auth type and plugged in the same credentials. The request succeeded. I checked the Postman console and saw ?status=hello%20world%21; it was percent encoding the !. I updated Postman, because a nice little prompt asked me to. Then I tried the same request; now it was getting an authorization failure, and I saw ?status=hello%20world! in the console; the ! was no longer being percent encoded.
I'm wondering who is at fault here. Perhaps Postman and I are making the same mistake. Perhaps it's with Twitter. Or perhaps there's some proxy along the way that idk, double encodes my !.
The OAuth1.0 spec says this, which I believe is in the context of both client (taking a request that's ready to go and signing it before it's sent), and server (for generating another signature to compare against the one received):
The parameters from the following sources are collected into a
single list of name/value pairs:
The query component of the HTTP request URI as defined by
[RFC3986], Section 3.4. The query component is parsed into a list
of name/value pairs by treating it as an
"application/x-www-form-urlencoded" string, separating the names
and values and decoding them as defined by
[W3C.REC-html40-19980424], Section 17.13.4.
That last reference, here, outlines the encoding for application/x-www-form-urlencoded, and says that space characters should be replaced with +, non-alphanumeric characters should be percent encoded, name separated from value by =, and pairs separated by &.
So, the OAuth1.0 spec says that the query string of the URL needs to be decoded as defined by application/x-www-form-urlencoded. Does that mean that our query string needs to be encoded this way too?
It seems to me, if a request is to be signed using OAuth1.0, the query component of the URL that gets sent must be encoded in a way that is different to what it would normally be encoded in? That's a pretty significant detail if you ask me. And I haven't seen it explicitly mentioned, even in Twitter's documentation. And evidently the folks at Postman overlooked it too? Unless I'm not supposed to be using URLComponents to build a URL, but that's what it's for, no? Have I understood this correctly?
Note: ?status=hello+world%21 succeeds; it tweets "hello world!"
I ran into a similar issue.
put the status in post body, not query string.
Percent-encoding:
private encode(str: string) {
// encodeURIComponent() escapes all characters except: A-Z a-z 0-9 - _ . ! ~ * " ( )
// RFC 3986 section 2.3 Unreserved Characters (January 2005): A-Z a-z 0-9 - _ . ~
return encodeURIComponent(str)
.replace(/[!'()*]/g, c => "%" + c.charCodeAt(0).toString(16).toUpperCase());
}
I am trying to make a web request to a URL that needs to keep accented characters instead of percent encoding them. E.g. é must NOT change to e%CC%81. I cannot change this.
These are the allowed characters that shouldn't be percent encoded: AaÁáBbCcDdEeÉéFfGgHhIiÍíJjKkLlMmNnOoÓóÖöŐőPpQqRrSsTtUuÚúÜüŰűVvWwXxYyZz0123456789-
Here is an example of a url I need
https://helyesiras.mta.hu/helyesiras/default/suggest?q=hány%20éves
You can try this url in your web borwser to confirm its working. (The site is in Hungarian.) If you try the proper percent encoded version of this url (https://helyesiras.mta.hu/helyesiras/default/suggest?q=ha%CC%81ny%20e%CC%81ves) then the website will give an error. (Also in Hungarian.)
I have my custom encoder to get this URL string. However to make a web request I need to convert the String to URL.
I tried 2 ways:
URL(string:)
let urlStr = "https://helyesiras.mta.hu/helyesiras/default/suggest?q=hány%20éves"
var url = URL(string: urlStr)
// ERROR: Returns nil
URLComponents with percentEncodedQueryItems
var urlComponents = URLComponents()
urlComponents.scheme = "https"
urlComponents.host = "helyesiras.mta.hu"
urlComponents.path = "/helyesiras/default/suggest"
urlComponents.percentEncodedQueryItems = [ // ERROR: invalid characters in percent encoded query items
URLQueryItem(name: "q", value: "hány%20éves")
]
let url = urlComponents.url
Is it possible to create URLs without Foundation APIs checking its validity? Or can I create my own validation rules?
Safari is percent-encoding the URL. You're just percent-encoding it differently (and in a way your server is rejecting). What Safari sends to the server is:
GET /helyesiras/default/suggest?q=h%C3%A1ny%20%C3%A9ves HTTP/1.1
You can check that using Charles. Your website is behaving correctly and does not appear to require unencoded URLs.
It is not valid to send unencoded URLs, and Safari doesn't. There's no way to do it with URLSession either. You'd have to connect to the socket directly and build your own HTTP stack, which is quite possible, but I don't think you want to do that.
As Leo notes, the correct way to do this is using:
URLQueryItem(name: "q", value: "hány éves")
Replacing the %20 with the unencoded " " so that you don't double-encode the percent.
If you encode the string by hand, you'll find the same encoding:
print("hány éves".addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed))
// Optional("h%C3%A1ny%20%C3%A9ves")
(But you should use URLComponents. addingPercentEncoding is extremely error-prone.)
The preferred UTF-8 encoding of á is as the unicode code point LATIN SMALL LETTER A WITH ACUTE (C3 A1). What you're encoding is LATIN SMALL LETTER A (61) followed by COMBINING ACUTE ACCENT (CC 81). I suspect your server is not applying Unicode normalization rules. While that's unfortunate the fix is simple: use URLComponents, and you'll get the same correct behavior as Safari.
We got a get request that sends string characters in url, so we use path variables to receive them. Apparently there is no way that the calling service would change its method of calling backend so we need to be able to accept a url with the following unencoded characters:
When percentage sign % is sent a http 400 is returned. It does go through if the two characters following % make up an UTF-encoded character
Backslash is converted into a forward slash. I need it to stay backslash.
I'm guessing these might be Tomcat or servlet configuration issues.
(spring boot version 1.5.14.RELEASE)
Percent signs (%) should be no problem if you properly URL encode them (%25). However, slashes and backslashes will not work with Tomcat, even if you encode them (%2F and %5C).
You could set the following properties when running the application:
-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true
However, this won't fix the issue, because in this case, those encoded slashes will be recognized as real ones. So, let's say you have the following controller:
#ResponseBody
#RequestMapping("/api/{foo}")
public String getFoo(#PathVariable String foo) {
return foo;
}
Well, then if you call /api/test%5Ctest, it won't be able to find the correct path. A solution to this problem is to use wildcard matchers and to parse the URL itself from the incoming HttpServletRequest:
#RequestMapping("/api/**")
public String getFoo(HttpServletRequest request) {
// ...
}
Another solution is to use a completely different web container. For example, when using Jetty, this isn't a problem at all, and URL encoded slashes and backslashes will both work.
Spring 5 now blocks encoded percent signs by default. To enable them, create a new Bean that calls setAllowUrlEncodedPercent()
#Bean
public HttpFirewall allowEncodedParamsFirewall() {
StrictHttpFirewall firewall = new StrictHttpFirewall();
firewall.setAllowUrlEncodedPercent(true);
return firewall;
}
There are similar method-calls for forward- and backwards-slash
What you are experiencing is not specific to Spring Boot. Instead, it's a restriction of HTTP.
The HTTP standard requires that any URL containing the percent characters must be decoded by the web server (cf page 36):
If the Request-URI is encoded using the "% HEX HEX" encoding [42], the
origin server MUST decode the Request-URI in order to properly
interpret the request.
As a result, it's not possible to escape the slash character reliably.
Therefore, when the slash is used in a URL – with or without encoding – it will be treated as a path separator. So it cannot be used in a Spring Boot path variable. Similar problem exist for the percent sign and backslash.
Your best options are to use query parameters or a POST request.
In the following URL, the value test_with_/and_% is transmitted:
https://host/abc/def?text=test_with_%2F_and%25
final String path =
request.getAttribute(HandlerMapping.PATH_WITHIN_HANDLER_MAPPING_ATTRIBUTE).toString();
final String bestMatchingPattern =
request.getAttribute(HandlerMapping.BEST_MATCHING_PATTERN_ATTRIBUTE).toString();
String arguments = new AntPathMatcher().extractPathWithinPattern(bestMatchingPattern, path);
if (null != arguments && !arguments.isEmpty()) {
pattern = pattern + '/' + arguments;
}
I also faced similar problem and I have used this so hope this might help
I'm writing app using Google custom search engine.
I received my search engine ID XXXXXXXX219143826571:7h9XXXXXXX (most interesting part bold).
Now I'm trying to use NSURLQueryItem to embed my ID into URL by using:
let params = ["cx" : engineID,...]
...
components.queryItems = parameters.map {
NSURLQueryItem(name: String($0), value: String($1))
}
It should percentage escape item to XXXXXXXX219143826571%3A7h9XXXXXXX (This value I'm getting when using Google APIs explorer while testing, it shows url dress that was used). It is not doing it. I'm getting url without escaping, no changes. If I use escaped string as engine ID in this mapping, I'm getting escaped string XXXXXXXX219143826571%253A7h9XXXXXXX (additional '25' is added to query).
Can someone tell me how to fix it? I don't want to use String and then convert it to URL by NSURL(string: str)!. It is not elegant.
Edit:
I'm using app Info.plist to save ID and I retrieve it by calling:
String(NSBundle.mainBundle().objectForInfoDictionaryKey("ApiKey")!)
Colons are allows in the query part of a URL string. There should be no need to escape them.
Strictly speaking, the only things that absolutely have to be encoded in that part of a URL are ampersands, hash marks (#), and (assuming you're doing a GET query with form encoding) equals signs. However, question marks in theory may cause problems, slashes are technically not allowed (but work just fine), and semicolons are technically allowed (but again, work in practice).
Colons, AFAIK, only have special meaning in the context of paths (if the OS treats it as a path separator) and in that it separates the scheme (protocol) from the rest of the URL.
So don't worry about the colon being unencoded unless the Google API barfs for some reason.
i am attempting to use HttpUtility.UrlEncode to encode strings that ultimately are used in URLs.
example
/string/http://www.google.com
or
/string/my test string
where http://www.google.com is a parameter passed to a controller.
I have tried UrlEncode but it doesn't seem to work quite right
my route looks like:
routes.MapRoute(
"mStringView",
"mString/{sText}",
new { controller = "mString", action = "Index", sText = UrlParameter.Optional }
);
The problem is the encoded bits are decoded it seems somewhere in the routing.. except things like "+" which replace " " are not decoded..
Understanding my case, where a UrlParameter can be any string, including URL's.. what is the best way to encode them before pushing them into my db, and then handling the decode knowing they will be passed to a controller as a parameter?
thanks!
It seems this problem has come up in other forums and the general recommendation is to not rely on standard url encoding for asp.net mvc. The advantage is url encoding is not necessarily as user friendly as we want, which is one of the goals of custom routed urls. For example, this:
http://server.com/products/Goods+%26+Services
can be friendlier written as
http://server.com/products/Good-and-Services
So custom url encoding has advantages beyond working around this quirk/bug. More details and examples here:
http://www.dominicpettifer.co.uk/Blog/34/asp-net-mvc-and-clean-seo-friendly-urls
You could convert the parameter to byte array and use the HttpServerUtility.UrlTokenEncode
If the problem is that the "+" doesn't get decoded, use HttpUtility.UrlPathEncode to encode and the decoding will work as desired.
From the documentation of HttpUtility.UrlEncode:
You can encode a URL using with the UrlEncode method or the
UrlPathEncode method. However, the methods return different results.
The UrlEncode method converts each space character to a plus character
(+). The UrlPathEncode method converts each space character into the
string "%20", which represents a space in hexadecimal notation. Use
the UrlPathEncode method when you encode the path portion of a URL in
order to guarantee a consistent decoded URL, regardless of which
platform or browser performs the decoding.