Storing Special Characters in Windows Azure Blob Metadata - special-characters
I have an app that is storing images in a Windows Azure Block Blob. I'm adding meta data to each blob that gets uploaded. The metadata may include some special characters. For instance, the registered trademark symbol (®). How do I add this value to meta data in Windows Azure?
Currently, when I try, I get a 400 (Bad Request) error anytime I try to upload a file that uses a special character like this.
Thank you!
You might use HttpUtility to encode/decode the string:
blob.Metadata["Description"] = HttpUtility.HtmlEncode(model.Description);
Description = HttpUtility.HtmlDecode(blob.Metadata["Description"]);
http://lvbernal.blogspot.com/2013/02/metadatos-de-azure-vs-caracteres.html
The supported characters in the blob metadata must be ASCII characters. To work around this you can either escape the string ( percent encode), base64 encode etc.
joe
HttpUtility.HtmlEncode may not work; if Unicode characters are in your string (i.e. ’), it will fail. So far, I have found Uri.EscapeDataString does handle this edge case and others. However, there are a number of characters that get encoded unnecessarily, such as space (' '=chr(32)=%20).
I mapped the illegal ascii characters metadata will not accept and built this to restore the characters:
static List<string> illegals = new List<string> { "%1", "%2", "%3", "%4", "%5", "%6", "%7", "%8", "%A", "%B", "%C", "%D", "%E", "%F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17", "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%7F", "%80", "%81", "%82", "%83", "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C", "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95", "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E", "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7", "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0", "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9", "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2", "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB", "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4", "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD", "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6", "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF", "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8", "%F9", "%FA", "%FB", "%FC", "%FD", "%FE" };
private static string MetaDataEscape(string value)
{
//CDC%20Guideline%20for%20Prescribing%20Opioids%20Module%206%3A%20%0Ahttps%3A%2F%2Fwww.cdc.gov%2Fdrugoverdose%2Ftraining%2Fdosing%2F
var x = HttpUtility.HtmlEncode(value);
var sz = value.Trim();
sz = Uri.EscapeDataString(sz);
for (int i = 1; i < 255; i++)
{
var hex = "%" + i.ToString("X");
if (!illegals.Contains(hex))
{
sz = sz.Replace(hex, Uri.UnescapeDataString(hex));
}
}
return sz;
}
The result is:
Before ==> "1080x1080 Facebook Images"
Uri.EscapeDataString =>
"1080x1080%20Facebook%20Images"
After => "1080x1080 Facebook
Images"
I am sure there is a more efficient way, but the hit seems negligible for my needs.
Related
String transformation for subject course code for Dart/Flutter
For interaction with an API, I need to pass the course code in <string><space><number> format. For example, MCTE 2333, CCUB 3621, BTE 1021. Yes, the text part can be 3 or 4 letters. Most users enter the code without the space, eg: MCTE2333. But that causes error to the API. So how can I add a space between string and numbers so that it follows the correct format.
You can achieve the desired behaviour by using regular expressions: void main() { String a = "MCTE2333"; String aStr = a.replaceAll(RegExp(r'[^0-9]'), ''); //extract the number String bStr = a.replaceAll(RegExp(r'[^A-Za-z]'), ''); //extract the character print("$bStr $aStr"); //MCTE 2333 } Note: This will produce the same result, regardless of how many whitespaces your user enters between the characters and numbers.
Try this.You have to give two texfields. One is for name i.e; MCTE and one is for numbers i.e; 1021. (for this textfield you have to change keyboard type only number). After that you can join those string with space between them and send to your DB. It's just like hack but it will work.
Scrolling down the course codes list, I noticed some unusual formatting. Example: TQB 1001E, TQB 1001E etc. (With extra letter at the end) So, this special format doesn't work with #Jahidul Islam's answer. However, inspired by his answer, I manage to come up with this logic: var code = "TQB2001M"; var i = course.indexOf(RegExp(r'[^A-Za-z]')); // get the index var j = course.substring(0, i); // extract the first half var k = course.substring(i).trim(); // extract the others var formatted = '$j $k'.toUpperCase(); // combine & capitalize print(formatted); // TQB 1011M Works with other formats too. Check out the DartPad here.
Here is the entire logic you need (also works for multiple whitespaces!): void main() { String courseCode= "MMM 111"; String parsedCourseCode = ""; if (courseCode.contains(" ")) { final ensureSingleWhitespace = RegExp(r"(?! )\s+| \s+"); parsedCourseCode = courseCode.split(ensureSingleWhitespace).join(" "); } else { final r1 = RegExp(r'[0-9]', caseSensitive: false); final r2 = RegExp(r'[a-z]', caseSensitive: false); final letters = courseCode.split(r1); final numbers = courseCode.split(r2); parsedCourseCode = "${letters[0].trim()} ${numbers.last}"; } print(parsedCourseCode); } Play around with the input value (courseCode) to test it - also use dart pad if you want. You just have to add this logic to your input value, before submitting / handling the input form of your user :)
How to display Unicode Smiley from json response dynamically in flutter
How to display Unicode Smiley from json response dynamically in flutter. It's display properly when i declare string as a static but from dynamic response it's not display smiley properly. Static Declaration: (Working) child: Text("\ud83d\ude0e\ud83d\ude0eThis is just test notification..\ud83d\ude0e\ud83d\ude0e\ud83d\udcaf\ud83d\ude4c") Dynamic Response: "message":"\\ud83d\\ude4c Be Safe at your home \\ud83c\\udfe0", When i'm parse and pass this response to Text then it's consider Unicode as a String and display as a string instead of Smiley Code is below to display text with smiley: child: Text(_listData[index].message.toString().replaceAll("\\\\", "\\")) Already go through this: Question but it's only working when single unicode not working with multiple unicode. Anyone worked with text along with unicode caracter display dynamically then please let me know.
Another alternate Good Solution I would give to unescape characters is this: 1st -> String s = "\\ud83d\\ude0e Be Safe at your home \\ud83c\\ude0e"; String q = s.replaceAll("\\\\", "\\"); This would print and wont be able to escape characters: \ud83d\ud83d Be Safe at your home \ud83c\ud83d and above would be the output. So what one can do is either unescape them while parsing or use: String convertStringToUnicode(String content) { String regex = "\\u"; int offset = content.indexOf(regex) + regex.length; while(offset > 1){ int limit = offset + 4; String str = content.substring(offset, limit); // print(str); if(str!=null && str.isNotEmpty){ String uni = String.fromCharCode(int.parse(str,radix:16)); content = content.replaceFirst(regex+str,uni); // print(content); } offset = content.indexOf(regex) + regex.length; // print(offset); } return content; } This will replace and convert all the literals into unicode characters and result and output of emoji: String k = convertStringToUnicode(q); print(k); 😎 Be Safe at your home That is above would be the output. Note: above answer given would just work as good but this is just when you want to have an unescape function and don't need to use third-party libraries. You can extend this using switch cases with multiple unescape solutions.
Issue resolved by using below code snippet. Client client = Client(); final response = await client.get(Uri.parse('YOUR_API_URL')); if (response.statusCode == 200) { // If the server did return a 200 OK response, // then parse the JSON. final extractedData = json.decode(response.body.replaceAll("\\\\", "\\")); } Here we need to replace double backslash to single backslash and then decode JSON respone before set into Text like this we can display multiple unicode like this: final extractedData = json.decode(response.body.replaceAll("\\", "\")); Hope this answer help to other
How to manually convert between Latin-5 and Unicode code points?
I have socket connection and I can get data from my socket. The data encoded as Latin-5. Example: // will get data from socket like: _xCustomerName = “879255:_:NÝYAZÝ TOROS”; _xCustomerName.replaceAll(new RegExp(r'Ý'), 'İ'); An seems it doesnt replace the 'Ý' to ‘İ’. How to manually convert between Latin-5 and Unicode code points?
The Latin-5 characters are specified, e.g., here. That looks like Cyrillic to me, so what you need is probably codepage 1254 Windows Latin-5 or ISO-8859-9/Latin-9. They both have Turkish characters with capital dotted I at position 0xDD. The six-character change you mention in your follow-up post matches Latin-9. From the character table, you can build a translation table with one character per byte in the 0..255 range: import "dart:typed_data"; // The Unicode characters of ISO-8859-9. const _latin9Table = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" " !\"#\$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_" "`abcdefghijklmnopqrstuvwxyz{|}~\x7f" "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" "\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf" "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" "\u011e\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\u0130\u015e\xdf" "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef" "\u011f\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\u0131\u015f\xff"; String decodeLatin9(List<int> bytes) { Uint16List charCodes = new Uint16List(bytes.length); for (int i = 0; i < bytes.length; i++) { charCodes[i] = _latin9Table.codeUnitAt(bytes[i]); } return String.fromCharCodes(charCodes); } That allows decoding of Latin-9. If you want to combine it into that dart:convert framework, you may need to create a Converter around that, or even an Encoding. Sadly, the package:convert doesn't have an easy way to create a code-page convereter. I think it should.
I know the differences Latin-5 (Turkish) character codes. With simple switch statements I can turn “879255::NÝYAZÝ TOROS” to “879255::NİYAZİ TOROS”. I wish I had a better solution (some sort of Dart plugin) but this is best I came so far. _xCustomerName.runes.forEach((int rune) { var character = new String.fromCharCode(rune); switch (character) { case 'Ð': character = 'Ğ'; break; case 'Ý': character = 'İ'; break; case 'Þ': character = 'Ş'; break; case 'ð': character = 'ğ'; break; case 'ý': character = 'ı'; break; case 'þ': character = 'ş'; break; } _xCustomerNameTurkish = _xCustomerNameTurkish + character; }); print(_xCustomerNameTurkish);
WP7's WebBrowser.NavigateToString() and text encoding
does anyone know how to load a UTF8-encoded string using WebBrowser.NavigateToString() method? For now I end up with a bunch of mis-displayed characters. Here's the simple string that won't display correctly: webBrowser.NavigateToString("ąęłóńżźćś"); The code file is saved with UTF-8 encoding (with signature). Thanks.
Using ConvertExtendedASCII as suggested works, but is very slow. Using a StringBuilder instead was (in my case) about 800 times faster: public string FixHtml(string HTML) { StringBuilder sb = new StringBuilder(); char[] s = HTML.ToCharArray(); foreach (char c in s) { if (Convert.ToInt32(c) > 127) sb.Append("&#" + Convert.ToInt32(c) + ";"); else sb.Append(c); } return sb.ToString(); }
First up, NavigateToString() is expecting a full html document. Secondly, as you're passing HTML, it's best to pass HTML entities, rather than relying on encodings. Unfortunately, not that many entity codes are actually supported by the browser so you should look at using the numeric unicode values where necessary. Much like this: webBrowser1.NavigateToString("<html><body><p>ó Õ</p></body></html>");
Try this article. It should help. Shortly speaking, it proposes to use following snippet to convert your string into appropriate format: private static string ConvertExtendedASCII(string HTML) { string retVal = ""; char[] s = HTML.ToCharArray(); foreach (char c in s) { if (Convert.ToInt32(c) > 127) retVal += "&#" + Convert.ToInt32(c) + ";"; else retVal += c; } return retVal; }
If you have the UTF8 in memory in a byte array then you could try NavigateToStream with a MemoryStream rather than using NavigateToString. You should try to ensure their is a BOM on the UTF8 buffer if you can. Note that the string in the question is not a UTF8 string. It is a UTF16 string with some garbage in it. By placing zeros between the bytes and storing it in a System.String you corrupted it.
What is the character encoding?
I have several characters that aren't recognized properly. Characters like: º á ó (etc..) This means that the characters encoding is not utf-8 right? So, can you tell me what character encoding could it be please.
We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad. If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.
read this first http://www.joelonsoftware.com/articles/Unicode.html There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.
I wrote a couple of methods to narrow down the possibilities a while back for situations just like this. static void Main(string[] args) { Encoding[] matches = FindEncodingTable('Ÿ'); Encoding[] enc2 = FindEncodingTable(159, 'Ÿ'); } // Locates all Encodings with the specified Character and position // "CharacterPosition": Decimal position of the character on the unknown encoding table. E.G. 159 on the extended ASCII table //"character": The character to locate in the encoding table. E.G. 'Ÿ' on the extended ASCII table static Encoding[] FindEncodingTable(int CharacterPosition, char character) { List matches = new List(); byte myByte = (byte)CharacterPosition; byte[] bytes = { myByte }; foreach (EncodingInfo encInfo in Encoding.GetEncodings()) { Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage); char[] chars = thisEnc.GetChars(bytes); if (chars[0] == character) { matches.Add(thisEnc); break; } } return matches.ToArray(); } // Locates all Encodings that contain the specified character static Encoding[] FindEncodingTable(char character) { List matches = new List(); foreach (EncodingInfo encInfo in Encoding.GetEncodings()) { Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage); char[] chars = { character }; byte[] temp = thisEnc.GetBytes(chars); if (temp != null) matches.Add(thisEnc); } return matches.ToArray(); }
Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols. An example of encoding can be seen when browsing the internet: The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address: www.example.com?search=... The following variables on the URL require URL encoding. If you was to write: www.example.com?search=cat food cheap The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space) To correct this encoding error you should exchange the ' ' with '%20' to form this URL: www.example.com?search=cat%20food%20cheap Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding. Good Luck!