Special characters in WCF web API URLs - special-characters

I have a web service which uses the WCF web api to create RESTful service. This serivce expects many different values in the url path seperated with a comma. This method works perfectly for simple data e.g. someones name or a numeric value. However I have a field on the client side (a java based BlackBerry app) which allows a user to freely type data which includes characters such as . or / which messes up my whole url.
Even when I replace the characters with their hex values e.g. a / to %2F the problem persists.
Does anyone know a means to either represent these characters in a URL which will be ignored when looking for the address or better yet a means to tell the URL the following characters are to be ignored perhaps in the way quotation marks would work?

You can use encode url function. URL Encoding is the process of converting string into valid URL format. Valid URL format means that the URL contains only what is termed alpha | digit | safe | extra | escape characters.

Related

How to get the same utf-8 encoding as Google for Arabic URLs?

Google: https%3A%2F%2Fwww.aljazeera.net%2Fnews%2Fhealthmedicine%2F2019%2F4%2F29%2F%25D9%2584%25D8%25AD%25D8%25AF%25D9%2588%25D8%25AB-%25D8%25A7%25D9%2584%25D8%25AD%25D9%2585%25D9%2584-%25D8%25A3%25D9%2588-%25D8%25AA%25D8%25AC%25D9%2586%25D8%25A8%25D9%2587-%25D9%2587%25D9%2583%25D8%25B0%25D8%25A7-%25D8%25AA%25D8%25AD%25D8%25AA%25D8%25B3%25D8%25A8%25D9%258A%25D9%2586-%25D8%25A3%25D9%258A%25D8%25A7%25D9%2585-%25D8%25A7%25D9%2584%25D8%25AA%25D8%25A8%25D9%2588%25D9%258A%25D8%25B6
Encoding with utf-8, I get the below: https%3A%2F%2Fwww.aljazeera.net%2Fnews%2Fhealthmedicine%2F2019%2F4%2F29%2F%D9%84%D8%AD%D8%AF%D9%88%D8%AB-%D8%A7%D9%84%D8%AD%D9%85%D9%84-%D8%A3%D9%88-%D8%AA%D8%AC%D9%86%D8%A8%D9%87-%D9%87%D9%83%D8%B0%D8%A7-%D8%AA%D8%AD%D8%AA%D8%B3%D8%A8%D9%8A%D9%86-%D8%A3%D9%8A%D8%A7%D9%85-%D8%A7%D9%84%D8%AA%D8%A8%D9%88%D9%8A%D8%B6
How can I get the same URLs as Google's?
In Python I've used the following method to utf-8 encode the Arabic url:
urllib.parse.quote(url.encode('utf-8'), safe='')
This gives the first encoded url above, which ends with D8%B6. Google's however ends with D8%25B6.
If I copy-paste the Arabic URL from a browser window to another i get the url encoding similar to mine, not the Google one:
The way I understand your question, you have a URL such as (from an Al Jazeera page in this case):
https://www.aljazeera.net/news/healthmedicine/2019/4/29/%D9%84%D8%AD%D8%AF%D9%88%D8%AB-%D8%A7%D9%84%D8%AD%D9%85%D9%84-%D8%A3%D9%88-%D8%AA%D8%AC%D9%86%D8%A8%D9%87-%D9%87%D9%83%D8%B0%D8%A7-%D8%AA%D8%AD%D8%AA%D8%B3%D8%A8%D9%8A%D9%86-%D8%A3%D9%8A%D8%A7%D9%85-%D8%A7%D9%84%D8%AA%D8%A8%D9%88%D9%8A%D8%B6
You then want to construct a Google Search Console URL for this page like:
https://search.google.com/search-console/performance/search-analytics?resource_id=sc-domain%3Aaljazeera.net&hl=ar&breakdown=page&page=!https%3A%2F%2Fwww.aljazeera.net%2Fnews%2Fhealthmedicine%2F2019%2F4%2F29%2F%25D9%2584%25D8%25AD%25D8%25AF%25D9%2588%25D8%25AB-%25D8%25A7%25D9%2584%25D8%25AD%25D9%2585%25D9%2584-%25D8%25A3%25D9%2588-%25D8%25AA%25D8%25AC%25D9%2586%25D8%25A8%25D9%2587-%25D9%2587%25D9%2583%25D8%25B0%25D8%25A7-%25D8%25AA%25D8%25AD%25D8%25AA%25D8%25B3%25D8%25A8%25D9%258A%25D9%2586-%25D8%25A3%25D9%258A%25D8%25A7%25D9%2585-%25D8%25A7%25D9%2584%25D8%25AA%25D8%25A8%25D9%2588%25D9%258A%25D8%25B6
So in short, you have a Google Search Console URL and want to add another URL as a query parameter.
Note that the Al Jazeera URL contains many non-ASCII characters that are properly encoded. In your browser's address bar, the URL will likely be displayed as
aljazeera.net/news/healthmedicine/2019/4/29/لحدوث-الحمل-أو-تجنبه-هكذا-تحتسبين-أيام-التبويض
That's not a valid URL but easier to read. When you copy the URL, you get the escaped one with ASCII characters only. That's the one you start with.
So the steps to create the Search Console URL are:
Run the Al Jazeera URL through URL encoding. Most programming language provide such a function. Or there are online service like https://www.urlencoder.org/
Append the result to the base Google Search Console:(https://search.google.com/search-console/performance/search-analytics?resource_id=sc-domain%3Aaljazeera.net&hl=ar&breakdown=page&page=!)
That's it.
Note that the Search Console base URL has two peculiarities:
The page parameter starts with an exclamation mark, e.g. ...&page=!https%3A...
For a different domain, the URL needs to be changed as the domain name appears a second time in the URL.
Python code:
import urllib.parse
url = "https://www.aljazeera.net/news/healthmedicine/2019/4/29/%D9%84%D8%AD%D8%AF%D9%88%D8%AB-%D8%A7%D9%84%D8%AD%D9%85%D9%84-%D8%A3%D9%88-%D8%AA%D8%AC%D9%86%D8%A8%D9%87-%D9%87%D9%83%D8%B0%D8%A7-%D8%AA%D8%AD%D8%AA%D8%B3%D8%A8%D9%8A%D9%86-%D8%A3%D9%8A%D8%A7%D9%85-%D8%A7%D9%84%D8%AA%D8%A8%D9%88%D9%8A%D8%B6"
google_base_url = "https://search.google.com/search-console/performance/search-analytics?resource_id=sc-domain%3Aaljazeera.net&hl=ar&breakdown=page&page=!"
final_url = google_base_url + urllib.parse.quote(url)
print(final_url)
Old answer
URL encoding is a tricky business because of mistakes in the encoding design, pecularities of the web servers and mostly because several different cases are usually mixed up.
Also note that most browsers do not display a correct URL in the address bar, but rather a partially decoded, easier to read URL.
The main cases to distinguish are:
Insert data with non-ASCII characters into the path of an URL (e.g.: https://ttt.com/FANCY_CHARACTERS/...)
Add data with non-ASCII characters as a query parameter (e.g.> https://ttt.com/res/f?f=FANCY_CHARACTERS)
Your case seems to be a special version of case 2, namely adding a URL as a query parameter to another URL.
So let's assume you have a valid URL from whatever source. It already contains encoded characters.
https://www.aljazeera.net/news/healthmedicine/2019/4/29/%D9%84%D8%AD%D8%AF%D9%88%D8%AB-%D8%A7%D9%84%D8%AD%D9%85%D9%84-%D8%A3%D9%88-%D8%AA%D8%AC%D9%86%D8%A8%D9%87-%D9%87%D9%83%D8%B0%D8%A7-%D8%AA%D8%AD%D8%AA%D8%B3%D8%A8%D9%8A%D9%86-%D8%A3%D9%8A%D8%A7%D9%85-%D8%A7%D9%84%D8%AA%D8%A8%D9%88%D9%8A%D8%B6
If you want to add it to another URL, you just need to run it through URL encoding. You don't need to care about Unicode characters as they are already encoded. The URL contains ASCII characters only:
https%3A%2F%2Fwww.aljazeera.net%2Fnews%2Fhealthmedicine%2F2019%2F4%2F29%2F%25D9%2584%25D8%25AD%25D8%25AF%25D9%2588%25D8%25AB-%25D8%25A7%25D9%2584%25D8%25AD%25D9%2585%25D9%2584-%25D8%25A3%25D9%2588-%25D8%25AA%25D8%25AC%25D9%2586%25D8%25A8%25D9%2587-%25D9%2587%25D9%2583%25D8%25B0%25D8%25A7-%25D8%25AA%25D8%25AD%25D8%25AA%25D8%25B3%25D8%25A8%25D9%258A%25D9%2586-%25D8%25A3%25D9%258A%25D8%25A7%25D9%2585-%25D8%25A7%25D9%2584%25D8%25AA%25D8%25A8%25D9%2588%25D9%258A%25D8%25B6
You can now add this URL to another URL, e.g.:
https://fff.com/ttt/qqq?url=https%3A%2F%2Fwww.aljazeera.net%2Fnews%2Fhealthmedicine%2F2019%2F4%2F29%2F%25D9%2584%25D8%25AD%25D8%25AF%25D9%2588%25D8%25AB-%25D8%25A7%25D9%2584%25D8%25AD%25D9%2585%25D9%2584-%25D8%25A3%25D9%2588-%25D8%25AA%25D8%25AC%25D9%2586%25D8%25A8%25D9%2587-%25D9%2587%25D9%2583%25D8%25B0%25D8%25A7-%25D8%25AA%25D8%25AD%25D8%25AA%25D8%25B3%25D8%25A8%25D9%258A%25D9%2586-%25D8%25A3%25D9%258A%25D8%25A7%25D9%2585-%25D8%25A7%25D9%2584%25D8%25AA%25D8%25A8%25D9%2588%25D9%258A%25D8%25B6
Let me know if that's what you wanted to do...

Trouble with Login with PayPal redirect_uri mismatch

I am trying to configure my NetIQ SocialAccess appliance to allow authentication via Login with PayPal using OpenIDConnect but cannot seem to get my Return URL correct. I have seen a recent blog entry stating that the matching would become more strict and wonder if anyone can tell me if the difference in these two strings would cause the redirect_uri mismatch error. SocialAccess is adding a header with a redirect_uri string beginning with https%3A rather than https: as configured for my application's Return URL.
"%3A" is encoded format of character ":", meaning SocialAccess is adding an encoded url string as your redirect_url, and eventually leading to a mismatch from what you have set in your APP config.
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has to be > converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.

QNetworkRequest and automatic convertation of percent-encoded characters

I'm trying to download the audio samples from Amazon with the help of QNetworkAccessManager+QNetworkRequest+QNetworkReply. I've got a big problem in processing the redirect from, for example, http://www.amazon.com/gp/dmusic/aws/sampleTrack.html?clientid=Shazam&ASIN=B00DJBQWAE to http://d28julafmv4ekl.cloudfront.net/64%2F30%2F239068457_S64.mp3?Expires=1380627695&Signature=BlaBlaBlaBla&Key-Pair-Id=BlaBlaBla
(Note the percent-encoded path returned from the server). The problem is that when redirect target URL is passed to new QNetworkRequest and the request is sent via QNAM, the %2F characters are automatically converted to slashes. This seems to be correct behavior, BUT the server requires these slashes to remain encoded. Is there any way to disable this convertation?
Btw, QNetworkReply also has similar feature - it returns the redirect url with already converted %xx characters.
You can apply a percent encoding to this url. This way, the '%2F' will be encoded to '%252F' and the QNetworkRequest will encode it back to '%2F'.
With this method: https://developer.blackberry.com/native/reference/cascades/qurl.html#toPercentEncoding

Httperf: How to test REST api with endoded uri

I want to test my REST API which has a URI something like this:
/myrestAPI/search?startTime=0&endTime=10&count=8&filters={"params":
[{"field":"Topic","value":"Algorithms","type":"MATCH_EXACT"}]}
How would I do that. The httperf reply status is "505 HTTP Version Not Supported"
I know that this uri the httperf is not properly encoding and sending it..
How would I achieve that in httperf?
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.
For you case, it would be:
/myrestAPI/search?startTime=0&endTime=10&count=8&filters=%7B%22params%22%3A%20%5B%7B%22field%22%3A%22Topic%22%2C%22value%22%3A%22Algorithms%22%2C%22type%22%3A%22MATCH_EXACT%22%7D%5D%7D
Try to experiment with URL encoder/decoder

What is &amp used for

Is there any difference in behaviour of below URL.
I don't know why the & is inserted, does it make any difference ?
www.testurl.com/test?param1=test&current=true
versus
www.testurl.com/test?param1=test&current=true
& is HTML for "Start of a character reference".
& is the character reference for "An ampersand".
&current; is not a standard character reference and so is an error (browsers may try to perform error recovery but you should not depend on this).
If you used a character reference for a real character (e.g. ™) then it (™) would appear in the URL instead of the string you wanted.
(Note that depending on the version of HTML you use, you may have to end a character reference with a ;, which is why &trade= will be treated as ™. HTML 4 allows it to be ommited if the next character is a non-word character (such as =) but some browsers (Hello Internet Explorer) have issues with this).
HTML doesn't recognize the & but it will recognize & because it is equal to & in HTML
I looked over this post someone had made: http://www.webmasterworld.com/forum21/8851.htm
My Source: http://htmlhelp.com/tools/validator/problems.html#amp
Another common error occurs when including a URL which contains an
ampersand ("&"):
This is invalid:
a href="foo.cgi?chapter=1&section=2&copy=3&lang=en"
Explanation:
This example generates an error for "unknown entity section" because
the "&" is assumed to begin an entity reference. Browsers often
recover safely from this kind of error, but real problems do occur in
some cases. In this example, many browsers correctly convert &copy=3
to ©=3, which may cause the link to fail. Since 〈 is the HTML
entity for the left-pointing angle bracket, some browsers also convert
&lang=en to 〈=en. And one old browser even finds the entity §,
converting &section=2 to §ion=2.
So the goal here is to avoid problems when you are trying to validate your website. So you should be replacing your ampersands with & when writing a URL in your markup.
Note that replacing & with & is only done when writing the URL in
HTML, where "&" is a special character (along with "<" and ">"). When
writing the same URL in a plain text email message or in the location
bar of your browser, you would use "&" and not "&". With HTML, the
browser translates "&" to "&" so the Web server would only see "&"
and not "&" in the query string of the request.
Hope this helps : )
That's a great example. When &current is parsed into a text node it is converted to ¤t. When parsed into an attribute value, it is parsed as &current.
If you want &current in a text node, you should write &current in your markup.
The gory details are in the HTML5 parsing spec - Named Character Reference State
if you're doing a string of characters.
make:
let linkGoogle = 'https://www.google.com/maps/dir/?api=1';
let origin = '&origin=' + locations[0][1] + ',' + locations[0][2];
aNav.href = linkGoogle + origin;