Description encoding error with YouTube API v3 - encoding

I've successfully created a project to upload YouTube videos programmatically through VB.NET, and it has worked for some weeks now until today.
I'm having trouble uploading videos which contain German umlauts in the description field: as soon as I try to upload such a video, I'm getting the following WebException:
System.Exception: Bad Request ---> System.Net.WebException:
If I remove the description field or the umlauts, the upload works without problems.
I've also tried to UTF8-encode the string, but not successfully.
The error just occurred today...

I had the very same error today: it was occurring with Japanese and Korean while English and Chinese/Taiwanese were fine.
At first, I thought it was utf8 related. A few hours later, I found out YouTube does not rely on ISO 3166-1. You can get their list there.
Replacing 'jp' to 'ja' and 'kr' to 'ko' in defaultAudioLanguage fixed the issue.

The issue is that the special characters can't be parsed through an http request. So why not write a converter that searches for the umlaut characters and converts them to characters that can be parsed, for example
ä -> a
ë -> e
ö -> o
û -> u
etc...
That would be the simplest way to do it, although you might be able to get away with switching to some encoding that will automatically remove them for you, then switch back to default to build the request.
I would play around with the different encodings that you can use in VB.Net and see what you can get.
Here is some documentation on what encoding there is available to .Net, how to UTF-8 encode strings in VB .Net, and the Encoding class reference for VB .Net:
http://msdn.microsoft.com/en-us/library/ms404377.aspx
vb.net - Encode string to UTF-8
http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx?cs-save-lang=1&cs-lang=vb#code-snippet-1

Related

Uploading Amazon Inventory UTF 8 Encoding

I am trying to upload my english inventory to various european amazon sites. The issue I am having is that the accents found in certain different languages are not displaying correctly when an "inventory file" is uploaded to amazon. The inventory file is a tab delimited text file.
current setup:
$type = 'text/tab-separated-values; charset=utf-8';
header('Content-Type:'.$type);
header('Content-Disposition: attachment; filename="inventory-'.$_GET['cc'].'.txt');
header('Content-Length: ' . strlen($data));
header('Content-Encoding: UTF-8');
When the text file is outputted and saved it looks exactly how it should when opened in windows (all the characters are correct) but for some reason amazon doesn't see it as UTF8 and re-encodes it with all of the characters found here:
http://www.i18nqa.com/debug/utf8-debug.html
I have tried adding the BOM to the top of the file but this just results in amazon giving an error. Has anyone else experienced this?
As #fvu pointed out in his comment, Amazon is expecting the ISO-8859-1 format, not UTF-8. That's why you should use PHP's utf8_decode method when writing to your file.
Ok so after a lot of trying it turns out that the characters needed to be decoded. I opened the text files in excel and they seemed to encode themselves as weird characters like ü using php utf8_decode turned them back into the correct characters EVEN THOUGH the text file showed them as the right characters... very confusing.
To anyone out there having difficulties with UTF 8 try decoding first.
thanks for your help

How to retrieve German Umlauts: u'\\nAm Boden zerst\\xf6rte Gladiator

I am pretty sure that this is a very basic question but after hours of searching and many attempts to fix this myself I still havent made progress.
Umlauts in my json file are saved like this. I found lots of ways to go from ö -> \xf6 but how can I go the other way round and end up with a utf-8 encoded file?
As per your comment I'd assume you're using python. When using json.load, parse it the utf-8 encoding parameter.
Look at the python documentation.

SBJson parser unhappy with [Ô]

I'm having issues finding out what's wrong with the json string I receive from http://www.hier-bin-ich-koenig.de/json/events to be able to parse it. It doesn't validate, at least not with jsonlint, but I don't know where the issue is. So of course SBJson is unhappy too.
I also don't understand where that [Ô] is coming from. I'd love to know if it's from the content or the code that's converting the content into json. Being able to find where the validation error is would be great.
The exact error sent by the tokeniser is:
JSONValue failed. Error is: Illegal start of token [Ô]
Your page includes a UTF-16 BOM (byte order mark), followed by a UTF-8 encoded document. You should drop the BOM entirely. It is not recommended for UTF-8 encoding.
I had the same problem when I was parsing a json string which was generated by a PHP page. I resolved this problem by using Notepad++,
1, open the php file.
2, menu -> encoding -> encode UTF8 without BOM
3, save.
that's done.

how to convert the old emoji encoding to the latest encoding in iOS5?

sadly, after iOS5 finally released, I got report from my users that they can not login.
Because there is emoji symbol in there names, and apple changed the encoding of emoji.
so there username contains a old version of emoji, how could I convert them to the new encoding?
thanks!
be specific: one emoji symbol "tiger", it is "\U0001f42f" in iOS5, but "\ue050" in earlier iOS version.
iOS 5 and OS X 10.7 (Lion) use the Unicode 6.0 standard ‘unified’ code points for emoji.
iOS 4 on SoftBank iPhones used a set of unofficial code points in the Unicode Private Use Area, and so aren't compatible with any other systems. To convert from this format to proper Unicode 6.0 characters, you'll need to run a big lookup table from Softbank code to Unified over all your current data and all new form data as it gets submitted. You might also want to do Unicode normalisation at this point, so that eg. fullwidth letters match normal ASCII letters.
See for example this table from a library that does emoji conversion tasks for PHP.
Emoji in usernames though?
I had the same problem, after digging for hours and finally found this answer that works for me
If you are using rails as your server, this is all you need to do. No need to do anything in ios/xcode, just pass the NSString without doing any UTF8/16 encoding stuff to the server.
Postegre stores the code correctly, it's just when you send the json response back to your ios client, assuming you do render json:#message, the json encoding has problem.
you could test whether you are having json encoding problem in your rails console by doing as simple test in your console
test = {"smiley"=>"u{1f604}"}
test.to_json
if it prints out "{\"smiley\":\"\uf604\"}" (notice the 1 is lost), then you have this problem. and the patch from the link will fix it.

UTF-8 incorrectly displayed in Lua/ Corona

In Lua, for an iPad Corona project, I'm requesting a UTF-8 server text file (containing Chinese characters) using network.request, but the result when displayed in the console or in the app shows as "garbage". Google Chrome, for instance, displays the same UTF-8 page fine, as I'm setting the http header when the server sends this (using PHP) to 'Content-Type: text/plain; charset=utf-8' (and there's no BOM, byte order mark either). The "garbage" I'm seeing in Lua looks similar to when I "force" Chrome to render the page as ISO-8859-1 using the options menu.
Does anyone have any help or pointers?
If all else fails, how would I convert the "garbage" string back to its UTF-8 origins within Lua?
Thanks for any help!
Lua doesn't know anything about UTF-8; Lua strings are just sequences of bytes. It sounds like Corona itself is parsing the strings as ISO8859-1. The most likely cause for this is them doing something really stupid and naive like treating each byte of the string as a Unicode code point.
I'm afraid I don't know Corona, so can't provide any specific solutions, but I'd suggest looking to see what functions it's got that involve encodings --- there may be a specific function to render a string with a particular encoding, for example.
Can you show the code for your network.request() call?
If you're downloading a html page, you should use network.download().
I had this exact same problem, except with Japanese characters. Although Lua doesn't support UTF-8, Corona acts like it does. What that means is that... if you pass a UTF-8 String to display.newText(...), it should display properly. Now, if you output to the console, it will actually print out the raw bytes of the String. And, if you try to print the length of the string, it will actually print out the number of bytes.
So, in summary, Lua treats all strings as an array of bytes. It knows nothing about UTF-8. Some Corona API methods, when passed UTF-8 strings, will display the strings correctly.
I had issues when I mixed UTF-8 with plain ASCII characters, which I believe confused Corona (what I mean is that I mixed English characters with Japanese characters... still all UTF-8, though). I have a hunch that each character in the string must be of the same length in bytes for Corona to display it properly. Try printing out one character at a time to see if that helps. Please feel free to post comments here if you run into trouble. I'd like to figure this issue out myself, too.