Google Sites api non-latin language encoding issue - encoding

I'm programming with Google sites api's.
My requests body consist parameters such as site title and category in non Latin language (Hebrew and Arab).
After the requests are done I see the site title and category name in ??? characters,
which is obviously an encoding problem.
Tried to change and request ContentType header to application/atom+xml;charset=UTF-8 from application/atom+xml
but it's not helping.
Thanks in advance

Using ASCII to html encoding did the trick for me.
Online Converter http://succulent-plant.com/toys/asciitohtml.html
The C# function:
public string ConvertToHtmlEncoding(string inpString)
{
string outString="";
for (var i = 0; i < inpString.Length; i++)
{
outString += "&#"+((int)inpString[i]).ToString()+";";
}
return outString;
}

Related

Storing Special Characters in Windows Azure Blob Metadata

I have an app that is storing images in a Windows Azure Block Blob. I'm adding meta data to each blob that gets uploaded. The metadata may include some special characters. For instance, the registered trademark symbol (®). How do I add this value to meta data in Windows Azure?
Currently, when I try, I get a 400 (Bad Request) error anytime I try to upload a file that uses a special character like this.
Thank you!
You might use HttpUtility to encode/decode the string:
blob.Metadata["Description"] = HttpUtility.HtmlEncode(model.Description);
Description = HttpUtility.HtmlDecode(blob.Metadata["Description"]);
http://lvbernal.blogspot.com/2013/02/metadatos-de-azure-vs-caracteres.html
The supported characters in the blob metadata must be ASCII characters. To work around this you can either escape the string ( percent encode), base64 encode etc.
joe
HttpUtility.HtmlEncode may not work; if Unicode characters are in your string (i.e. &#8217), it will fail. So far, I have found Uri.EscapeDataString does handle this edge case and others. However, there are a number of characters that get encoded unnecessarily, such as space (' '=chr(32)=%20).
I mapped the illegal ascii characters metadata will not accept and built this to restore the characters:
static List<string> illegals = new List<string> { "%1", "%2", "%3", "%4", "%5", "%6", "%7", "%8", "%A", "%B", "%C", "%D", "%E", "%F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17", "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%7F", "%80", "%81", "%82", "%83", "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C", "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95", "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E", "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7", "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0", "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9", "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2", "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB", "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4", "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD", "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6", "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF", "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8", "%F9", "%FA", "%FB", "%FC", "%FD", "%FE" };
private static string MetaDataEscape(string value)
{
//CDC%20Guideline%20for%20Prescribing%20Opioids%20Module%206%3A%20%0Ahttps%3A%2F%2Fwww.cdc.gov%2Fdrugoverdose%2Ftraining%2Fdosing%2F
var x = HttpUtility.HtmlEncode(value);
var sz = value.Trim();
sz = Uri.EscapeDataString(sz);
for (int i = 1; i < 255; i++)
{
var hex = "%" + i.ToString("X");
if (!illegals.Contains(hex))
{
sz = sz.Replace(hex, Uri.UnescapeDataString(hex));
}
}
return sz;
}
The result is:
Before ==> "1080x1080 Facebook Images"
Uri.EscapeDataString =>
"1080x1080%20Facebook%20Images"
After => "1080x1080 Facebook
Images"
I am sure there is a more efficient way, but the hit seems negligible for my needs.

Method post: accents on my app's description, posting on user's wall, result in diamonds and interrogation signs

I'm using a http call to post information about my app in any user's wall. I do it in this way:
_url = "https://graph.facebook.com/" + user_id + "/feed?message=MSG";
_url += "&access_token=" + access_token;
_url += "&picture=" + fb_app_url + "fb_icon.png";
_url += "&name=" + String_ToUrl("MY GAMES");
_url += "&link=" + "http://www.nlkgames.com";
_url += "&description=String_ToUrl("descripción with accent")"
_url += "&method=post";
http.URL_CALL(_url);
This method post right the information in the user's wall, but the accents are shown by a diamond with a question sign inside it. I don't know how to make it works with accents.
String_ToUrl will encode string in this way:
"descripción with accent" = "descripci%E9n+with+accent"
What's my fail? Why the user's facebook's wall doesn't recognize the urlencoded?
Instead of encoding it for URL, what happens if you just post the description as is?
If that doesn't work, try converting the string using HTML entities and then URL Encoding the description.
you could also try to replace those letters with the ASCII representation (somewhat like \u123) - that would be what the Graph API gives you when there is such a Special character (in my case it would be äöü from the german alphabet)
Your String_ToUrl function doesn't encode URL in unicode aware way.
ó (aka 'LATIN SMALL LETTER O WITH ACUTE' (U+00F3)) should became %C3%B3 in URL encoded form, and not %F3 (in your case you indicated that it became %E9 which in fact é character).
descripción -> descripci%C3%B3n
Be sure to use proper (and unicode aware) way to encode your parameters, in JavaScript for example encodeURIComponent should be used...

How to set receivedDataEncoding for big5 chinese?

I have a trouble in received data with chinese-big5 encoded web-page,
and I tried to get some sample code but can not find I need for big5 like below:
if ([encodingName isEqualToString:#"euc-jp"]) {
receivedDataEncoding = NSJapaneseEUCSStringEncoding;
} else {
receivedDataEncoding = NSUTF8StringEncoding};
How to replace the part of "NSJapaneseEUCSStringEncoding" for big5 chinese encoding?
Thanks for answer first.
You can use the kCFStringEncodingBig5_E constant which is available in
CoreFoundation/CFStringEncodingExt.h

What is the character encoding?

I have several characters that aren't recognized properly.
Characters like:
º
á
ó
(etc..)
This means that the characters encoding is not utf-8 right?
So, can you tell me what character encoding could it be please.
We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad.
If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.
read this first http://www.joelonsoftware.com/articles/Unicode.html
There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.
I wrote a couple of methods to narrow down the possibilities a while back for situations just like this.
static void Main(string[] args)
{
Encoding[] matches = FindEncodingTable('Ÿ');
Encoding[] enc2 = FindEncodingTable(159, 'Ÿ');
}
// Locates all Encodings with the specified Character and position
// "CharacterPosition": Decimal position of the character on the unknown encoding table. E.G. 159 on the extended ASCII table
//"character": The character to locate in the encoding table. E.G. 'Ÿ' on the extended ASCII table
static Encoding[] FindEncodingTable(int CharacterPosition, char character)
{
List matches = new List();
byte myByte = (byte)CharacterPosition;
byte[] bytes = { myByte };
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = thisEnc.GetChars(bytes);
if (chars[0] == character)
{
matches.Add(thisEnc);
break;
}
}
return matches.ToArray();
}
// Locates all Encodings that contain the specified character
static Encoding[] FindEncodingTable(char character)
{
List matches = new List();
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = { character };
byte[] temp = thisEnc.GetBytes(chars);
if (temp != null)
matches.Add(thisEnc);
}
return matches.ToArray();
}
Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols.
An example of encoding can be seen when browsing the internet:
The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address:
www.example.com?search=...
The following variables on the URL require URL encoding. If you was to write:
www.example.com?search=cat food cheap
The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space)
To correct this encoding error you should exchange the ' ' with '%20' to form this URL:
www.example.com?search=cat%20food%20cheap
Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding.
Good Luck!

How to a recover a text from a wrong encoding?

I have got some files created from some asian OS (chinese and japanese XPs)
the file name is garbled, for example:
иè+¾«Ñ¡Õä²ØºÏ¼­
how i can recover the original text?
I tried with this in c#
Encoding unicode = Encoding.Unicode;
Encoding cinese = Encoding.GetEncoding(936);
byte[] chineseBytes = chinese.GetBytes(garbledString);
byte[] unicodeBytes = Encoding.Convert(unicode, chinese, chineseBytes);
//(Then convert byte in string)
and tried to change unicode to windows-1252 but no luck
It's a double-encoded text. The original is in Windows-936, then some application assumed the text is in ISO-8869-1 and encoded the result to UTF-8. Here is an example how to decode it in Python:
>>> print 'иè+¾«Ñ¡Õä²ØºÏ¼­'.decode('utf8').encode('latin1').decode('cp936')
新歌+精选珍藏合辑
I'm sure you can do something similar in C#.
Encoding unicode = Encoding.Unicode;
That's not what you want. “Unicode” is Microsoft's totally misleading name for what is really the UTF-16LE encoding. UTF-16LE plays no part here, what you have is a simple case where a 936 string has been misdecoded as 1252.
Windows codepage 1252 is similar but not the same as ISO-8859-1. There is no way to tell which is in the example string as it does not contain any of the bytes 0x80-0x9F which are different in the two encodings, but I'm assuming 1252 because that's the standard codepage on a western Windows install.
Encoding latin= Encoding.getEncoding(1252);
Encoding chinese= Encoding.getEncoding(936);
chinese.getChars(latin.getBytes(s));
The first argument to Encoding.Convert is the source encoding, Shouldn't that be chinese in your case? So
Encoding.Convert(chinese, unicode, chineseBytes);
might actually work. Because, after all, you want to convert CP-936 to Unicode and not vice-versa. And I'd suggest you don't even try bothering with CP-1252 since your text there is very likely not Latin.
This is an old question, but I just ran into the same situation while trying to migrate WordPress upload files off of an old Windows Server 2008 R2 server. bobince's answer set me on the right track, but I had to search for the right encoding/decoding pair.
With the following C#, I found the relevant encoding/deciding pair:
using System;
using System.Text;
public class Program
{
public static void Main()
{
// garbled
string s = "2020竹慶本樂ä»æ³¢åˆ‡äºžæ´²æ³•ç­µ-Intro-2-1024x643.jpg";
// expected
string t = "2020竹慶本樂仁波切亞洲法筵-Intro-2-1024x643.jpg";
foreach( EncodingInfo ei in Encoding.GetEncodings() ) {
Encoding e = ei.GetEncoding();
foreach( EncodingInfo ei2 in Encoding.GetEncodings() ) {
Encoding e2 = ei2.GetEncoding();
var s2 = e2.GetString(e.GetBytes(s));
if (s2 == t) {
var x = ei.CodePage;
Console.WriteLine($"e1={ei.DisplayName} (CP {ei.CodePage}), e2={ei2.DisplayName} (CP {ei2.CodePage})");
Console.WriteLine(t);
Console.WriteLine(s2);
}
}
}
Console.WriteLine("-----------");
Console.WriteLine(t);
Console.WriteLine(Encoding.GetEncoding(65001).GetString(Encoding.GetEncoding(1252).GetBytes(s)));
}
}
It turned out that the correct encoding/deciding in my case was:
e1=Western European (Windows) (CP 1252), e2=Unicode (UTF-8) (CP 65001)
So the last line of code is a one-liner for the correct conversion Console.WriteLine(Encoding.GetEncoding(65001).GetString(Encoding.GetEncoding(1252).GetBytes(s)));.