taglib-sharp id3v2 wrong character encoding - encoding

I've got an mp3 with cyrillic text in both v1 and v2 tags. Windows and media players (aimp) display both tags fine. However when I parse tags using taglib-sharp (also tried a few similar libs) i'm getting weird characters like Äåðæèñü. I know there can be an issue with specific files but as Windows and players are OK then there must be a way to properly decode it. How can I decode that?

Related

facebook like button is not working is not working with Cyrillic domain

Facebook like button is not working with Cyrillic domains. I've tried diffrent methods, but I still cannot see it.
What can I do to resolve it.
did you try punnycode (fore reference see http://en.wikipedia.org/wiki/Punycode)? you might want to check punycode converter on http://mct.verisign-grs.com/ (first google result I had)
you could try using punnycode converter from Cyrillic IDN (internationalized domain name) from Unicode to Ascii (format readable by browsers), so it would look like:
Unicode пример.ru
ASCII xn--e1afmkfd.ru
Im not sure if this will help, but try to use this Ascii version of your Cyrillic url in facebook code.

Unicode character change after retrieving from server in iPhone

I have an iPhone application in which I am sending text in an UITextview to a server through a web service. and in the next page I am displaying the list of comments from the server through the web service. Everything is working fine except when I insert emoji/emoticons in the UITextview.
The next page displays square boxes instead of some emoji character (not all).
I have noticed that:
Working: I have inserted one emoji character in UITextview from an emoji keyboard and printed its code in log, \u2601, and submitted this text to the server. In the next page I got the same unicode \u2601 and it's working fine. It shows me the emoji icon.
Not Working: Now I have inserted another emoji character in the UITextview from the emoji keyboard and printed its code in log, \ud83d\udc16, and submitted this text to the server. In the next page I got a unicode codepoint which is different from what I sent: \uf416. iPhone doesn't recognize this unicode so it gives me a square box.
So what is the problem here? It's not working only when the emoji character has a pair of unicode codepoints.
The database in which the comment is stored is MySQL version 5.5.
Why does the emoji character code pair change when retrieved from the server? How to decode the retrieved Unicode into its original form so iPhone can recognize it?
The character U+F416 is in one of the Private Use Areas (PUA).
Likely, some code on the way from the app to the database and back replaced the emoji with this character. This could be any of the components on this way, e.g. the client library tha you use to communicate with the service, the web service, the database interface, or the database itself. Try to tackle each of this layers individually, or follow the character through the layers to check whether it is still correct.
Thanx Mar Byers, You gave me hint. I have read what is Unicode planes and what is surrogates and how it works. I have found that that is no server problem actually, because my SQL server version is 5.5 that has character set uyf8mb4, i have used that set in Comments column.
Now the problem with SBJSON that i used in my app to decode json response geting from server.SBJSON decode that surrogate pair(UTF16 UTF16) in one unicode UTF16, and thats why the emoji with surrogate pair is not displayed. Apple introduce json library NSJSONSerialization for ios5 and later os. I used that library to decode json response and my code work like charm..:)
I have got solution from here
http://blog.manbolo.com/2011/12/12/supporting-ios-5-new-emoji-encoding
and here
http://blog.manbolo.com/2012/10/29/supporting-new-emojis-on-ios-6
thanx to #Manbolo Blog.

How can I properly display Vietnamese characters in ColdFusion?

I having a hard trying to properly display Vietnamese text in ColdFusion. I've proper charset set to UTF-8 but still no luck. The same texts work fine in a HTML page. What else am I missing? Any suggestion would be much appreciated.
Html:
ColdFusion:
Thanks!
There are two things you need to watch out for, as far as I recall of the top of my head.
The first is to ensure that the .cfm file itself is saved as UTF-8 - this is a file system option, and will probably be settable in your editor. This ensures that the UTF-8 characters are correctly preserved when saving the file.
The other is that every .cfm file that includes any UTF-8 text should start with:
<cfprocessingdirective pageencoding="utf-8" />
This ensures that ColdFusion delivers the page to the browser in the correct format.
Just to be sure, when you display your working HTML, can you check the page encoding used by your browser (ie. in FireFox you can right-click+page Info). Maybe your text is not UTF-8 encoded that could explain the problem...

UTF8 encoding problem, same results work fine in wordpress

I have a wordpress installation that clients can edit, all characters display ok. On the main homepage I query the same database for the same title and post content, but it doesn't display correctly - just a question mark
I have tried sending the utf8 headers manually, through htaccess and through meta tags. I have used SET name UTF8 (which turns the characters into the diamond symbol with a questionmark inside).
I genuinely cant figure out what it could be now and I really need these characters to display correctly.
Heres the homepage, you can see in the Sounddhism 6 preview that there are lots of question marks, if you click on it you will see what they are meant to look like
http://nottingham.subverb.net
I have passed it through the validator and it gives me this error:
Sorry, I am unable to validate this document because on line 373 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xA0" does not map to Unicode
Which, i appreciate is supposed to help me, but I don't know what to do about it. Especially since that line, the letter generating the error is supposed to be a space and is AFTER the offending question marks.
Can anyone help?
Compare the encoding of both the back-end scripts in Wordpress and also your homepage script. If you're using IE, right-click the page and check the encoding. Sometimes it's set to "Auto-detect" and IE will often detect a different encoding for different pages, causing strange issues like this.
If you're not using IE, try using a tool like Fiddler to see exactly what encoding (and what bytes are being sent back and forth both in the back-end and your homepage script.
If forcing UTF-8 on your homepage script doesn't work, I would guess that the back-end is not using UTF-8.

how to unicode commentting a mp3 using LAME encoder

I want to add some comments to my mp3s but my comments all are in non-latin characters like arabic.
I have written a shell program to get the comments from user in windows but since LAME.exe is a console program I don't know how to convert these non-latin character to something meaningful for LAME.
so is there any way to add these kind of comments using LAME.exe?
regards.
I think you're going to have real trouble doing it on the Windows command line, as everything will be working in the system default code page (ANSI) and not Unicode. You won't be able to use Arabic at all unless you're on an Arabic Windows install (ANSI=code page 1256; settable in the region options), and even then I'm not sure it'll actually use the right encoding.
In any case lame.exe is not a good choice for editing tags, as it's an audio encoder, which will decode and re-encode the MP3, causing quality loss.
There are many graphical apps that will batch re-tag MP3s. If you want a scriptable solution you're probably better off with a higher-level language/library that supports Unicode better than the Windows command line/bat files (eg Python + Mutagen, but there are many possibilities depending on what languages you're familiar with).