Encoding URLs containing unicode characters - android-networking

Is there an Android class that (correctly) encodes URLs containing unicode characters? For example:
Blue Öyster Cult
Is converted to the following using java.net.URI:
uri.toString()
(java.lang.String) Blue%20Öyster%20Cult
The Ö character is not encoded. Using URLEncoder:
URLEncoder.encode("Blue Öyster Cult", "UTF-8").toString()
(java.lang.String) Blue+%C3%96yster+Cult
It encodes too much (i.e. spaces become "+" and path separators "/" become %2F). If I click on a link containing unicode characters with the Dolphin web browser it works correctly, so obviously this can be done. But if I try to open an HttpURLConnection using any of the above strings, I get an HTTP 404 Not Found exception.

I ended up hacking together a solution that seems to work for this, but is probably not the most robust:
url = new URL(userSuppliedPath);
String context = url.getProtocol();
String hostname = url.getHost();
String thePath = url.getPath();
int port = url.getPort();
thePath = thePath.replaceAll("(^/|/$)", ""); // removes beginning/end slash
String encodedPath = URLEncoder.encode(thePath, "UTF-8"); // encodes unicode characters
encodedPath = encodedPath.replace("+", "%20"); // change + to %20 (space)
encodedPath = encodedPath.replace("%2F", "/"); // change %2F back to slash
urlString = context + "://" + hostname + ":" + port + "/" + encodedPath;

URLEncoder is designed to be used to encode form content, not whole URI's. Encoding / as %2F is intentional to prevent user input from being interpreted as a directory, and + is valid encoding for form data. (form data == part of the URI following the ?)
Ideally, you would encode "Blue Öyster Cult" before appending it to your base URI, instead of encoding the whole string. And if "Blue Öyster Cult" is part of the path instead of part of the query string, you have to replace + with %20 yourself. With these restrictions, URLEncoder works fine.

Related

Flutter Unicode Apostrophe In String

I'm hoping this is an easy question, and that I'm just not seeing the forest due to all the trees.
I have a string in flutter than came from a REST API that looks like this:
"What\u0027s this?"
The \u is causing a problem.
I can't do a string.replaceAll("\", "\") on it as the single slash means it's looking for a character after it, which is not what I need.
I tried doing a string.replaceAll(String.fromCharCode(0x92), "") to remove it - That didn't work.
I then tried using a regex to remove it like string.replaceAll("/(?:\)/", "") and the same single slash remains.
So, the question is how to remove that single slash, so I can add in a double slash, or replace it with a double slash?
Cheers
Jase
I found the issue. I was looking for hex 92 (0x92) and it should have been decimal 92.
I ended up solving the issue like this...
String removeUnicodeApostrophes(String strInput) {
// First remove the single slash.
String strModified = strInput.replaceAll(String.fromCharCode(92), "");
// Now, we can replace the rest of the unicode with a proper apostrophe.
return strModified.replaceAll("u0027", "\'");
}
When the string is read, I assume what's happening is that it's being interpreted as literal rather than as what it should be (code points) i.e. each character of \0027 is a separate character. You may actually be able to fix this depending on how you access the API - see the dart convert library. If you use utf8.decode on the raw data you may be able to avoid this entire problem.
However, if that's not an option there's an easy enough solution for you.
What's happening when you're writing out your regex or replace is that you're not escaping the backslash, so it's essentially becoming nothing. If you use a double slash, that solve the problem as it escapes the escape character. "\\" => "\".
The other option is to use a raw string like r"\" which ignores the escape character.
Paste this into https://dartpad.dartlang.org:
String withapostraphe = "What\u0027s this?";
String withapostraphe1 = withapostraphe.replaceAll('\u0027', '');
String withapostraphe2 = withapostraphe.replaceAll(String.fromCharCode(0x27), '');
print("Original encoded properly: $withapostraphe");
print("Replaced with nothing: $withapostraphe1");
print("Using char code for ': $withapostraphe2");
String unicodeNotDecoded = "What\\u0027s this?";
String unicodeWithApostraphe = unicodeNotDecoded.replaceAll('\\u0027', '\'');
String unicodeNoApostraphe = unicodeNotDecoded.replaceAll('\\u0027', '');
String unicodeRaw = unicodeNotDecoded.replaceAll(r"\u0027", "'");
print("Data as read with escaped unicode: $unicodeNotDecoded");
print("Data replaced with apostraphe: $unicodeWithApostraphe");
print("Data replaced with nothing: $unicodeNoApostraphe");
print("Data replaced using raw string: $unicodeRaw");
To see the result:
Original encoded properly: What's this?
Replaced with nothing: Whats this?
Using char code for ': Whats this?
Data as read with escaped unicode: What\u0027s this?
Data replaced with apostraphe: What's this?
Data replaced with nothing: Whats this?
Data replaced using raw string: What's this?

Can't unescape escaped string with ABAP

I want to escape this string in SAPUI5 like this.
var escapedLongText = escape(unescapedLongText);
String (UTF-8 quote, space, Unicode quote)
" “
Escaped string
%22%20%u201C
I want to unescape it with this method, but it returns empty. Any ideas?
DATA: LV_STRING TYPE STRING.
LV_STRING = '%22%20%u201C'.
CALL METHOD CL_HTTP_UTILITY=>UNESCAPE_URL
EXPORTING
ESCAPED = LV_STRING
RECEIVING
UNESCAPED = LV_STRING.
I changed the code in SAPUI5 to the following:
var escapedLongText = encodeURI(unescapedLongText);
This results in: (like andreas mentioned)
%22%20%e2%80%9c
If I want to decode it later in SAPUI5, it can be done like this:
var unescapedLongText = unescape(decodeURI(escapedLongText));
The unescape needs to be done, because commas (for example) don't seem to be decoded automatically.

BrowserWindow.Launch NOT decode string

In a codedui, this line decodes the link before opening the browser:
myBrowser = BrowserWindow.Launch(New System.Uri("http://blablabla/main.aspx?sn=JFA%3d%3d"))
I want it to remain encoded in the url. In other words, when the browser opens, the %3d are already converted to "=" in the string.
http://mylinkhere/main.aspx?sn=JFA==
I want them to remain as %3d.
Thanks
Try this bro, this is in c# btw.
String URL = HttpUtility.UrlEncode("sn=JFA%3d%3d");
window = BrowserWindow.Launch(new Uri("http://blablabla/main.aspx?" + URL));

Method post: accents on my app's description, posting on user's wall, result in diamonds and interrogation signs

I'm using a http call to post information about my app in any user's wall. I do it in this way:
_url = "https://graph.facebook.com/" + user_id + "/feed?message=MSG";
_url += "&access_token=" + access_token;
_url += "&picture=" + fb_app_url + "fb_icon.png";
_url += "&name=" + String_ToUrl("MY GAMES");
_url += "&link=" + "http://www.nlkgames.com";
_url += "&description=String_ToUrl("descripción with accent")"
_url += "&method=post";
http.URL_CALL(_url);
This method post right the information in the user's wall, but the accents are shown by a diamond with a question sign inside it. I don't know how to make it works with accents.
String_ToUrl will encode string in this way:
"descripción with accent" = "descripci%E9n+with+accent"
What's my fail? Why the user's facebook's wall doesn't recognize the urlencoded?
Instead of encoding it for URL, what happens if you just post the description as is?
If that doesn't work, try converting the string using HTML entities and then URL Encoding the description.
you could also try to replace those letters with the ASCII representation (somewhat like \u123) - that would be what the Graph API gives you when there is such a Special character (in my case it would be äöü from the german alphabet)
Your String_ToUrl function doesn't encode URL in unicode aware way.
ó (aka 'LATIN SMALL LETTER O WITH ACUTE' (U+00F3)) should became %C3%B3 in URL encoded form, and not %F3 (in your case you indicated that it became %E9 which in fact é character).
descripción -> descripci%C3%B3n
Be sure to use proper (and unicode aware) way to encode your parameters, in JavaScript for example encodeURIComponent should be used...

What is the character encoding?

I have several characters that aren't recognized properly.
Characters like:
º
á
ó
(etc..)
This means that the characters encoding is not utf-8 right?
So, can you tell me what character encoding could it be please.
We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad.
If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.
read this first http://www.joelonsoftware.com/articles/Unicode.html
There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.
I wrote a couple of methods to narrow down the possibilities a while back for situations just like this.
static void Main(string[] args)
{
Encoding[] matches = FindEncodingTable('Ÿ');
Encoding[] enc2 = FindEncodingTable(159, 'Ÿ');
}
// Locates all Encodings with the specified Character and position
// "CharacterPosition": Decimal position of the character on the unknown encoding table. E.G. 159 on the extended ASCII table
//"character": The character to locate in the encoding table. E.G. 'Ÿ' on the extended ASCII table
static Encoding[] FindEncodingTable(int CharacterPosition, char character)
{
List matches = new List();
byte myByte = (byte)CharacterPosition;
byte[] bytes = { myByte };
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = thisEnc.GetChars(bytes);
if (chars[0] == character)
{
matches.Add(thisEnc);
break;
}
}
return matches.ToArray();
}
// Locates all Encodings that contain the specified character
static Encoding[] FindEncodingTable(char character)
{
List matches = new List();
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = { character };
byte[] temp = thisEnc.GetBytes(chars);
if (temp != null)
matches.Add(thisEnc);
}
return matches.ToArray();
}
Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols.
An example of encoding can be seen when browsing the internet:
The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address:
www.example.com?search=...
The following variables on the URL require URL encoding. If you was to write:
www.example.com?search=cat food cheap
The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space)
To correct this encoding error you should exchange the ' ' with '%20' to form this URL:
www.example.com?search=cat%20food%20cheap
Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding.
Good Luck!