How to convert character to unicode? [closed] - unicode

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have this character.
&#8211
How to convert this character to unicode?
Sorry if it is a silly question.

It's not a silly question, character encoding can be tricky to get your head around. I highly recommend reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) (I'm sure you can guess the topic).
Unicode itself isn't an encoding, it's a very long list of characters and code points. What I'm guessing you want to do is display the dash character in some way. Where are you wanting to display or store the data? If it's in a browser, then that representation should work as that's the HTML encoded version. If you want to store it in a database then you'll need to convert that encoded version to a string and then convert that string to whatever encoding the database is using.

Take a look at this source has the encoding in different formats
http://www.fileformat.info/info/unicode/char/2013/index.htm
but each language has its own rules on how to write this in a string/char literal

Related

ISO-8859-9/Latin-9 encoding [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I know it exist ISO-8859-9/Latin-5 or ISO-8859-15/Latin-9, but recently I had to manage some messages encoded with ISO-8859-9/Latin-9 format.
What does it exactly mean?
There is ISO-8859-9 which is called Latin-5.
And there is ISO-8859-15 which is called Latin-9.
Yes, it is confusing. In my opinion it's simplest to always only use the ISO-8859-n moniker. That avoids potential confusions.
So "ISO-8859-9/Latin-9" is probably a typo (or someone wrongly thought that the suffix is identical for the "ISO-8859-" and the "Latin-" prefix).
Depending on the source of the data, you can guess which one they meant. ISO-8859-9 is used for Turkish text and ISO-8859-15 is basically the modern replacement for ISO-8859-1 (covering most of Western Europe, mostly used because it has the € symbol).
Source: ISO/IEC 8859 Wiki page.

Map special characters to URL safe, readable versions [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
i am looking for a mapping table or Perl module or anything else, which makes it possible to map characters to a URL safe version that is also readable.
I need to build URLs without any special characters. The base words are city names in their native language which means it can contain special characters from that language.
For example, when i have something like the polish city name 'łódź' i need to get a readable version like: 'lodz'
The major browsers show and accept non-ASCII characters in the URL bar even if they need to be encoded during transmission.
For example,
http://.../city/Montr%C3%A9al
will appear as
http://.../city/Montréal
in the browser's URL bar. [Test]
But if you want to convert to a subset of ASCII, you'd start by using Text::Unidecode's unidecode. Then you gotta decide what to do with the characters that must be escaped in URLs.

How does compiler understand Unicode characters so quickly? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have made a document based program lately.
But what intrigues me that how can a compiler(in my case, objective-c) convert any character into Unicode so fast while these characters are only visual presentations.
I think maybe A~Z and all other common characters can be converted from ASCII to Unicode very easily. What about other special character such as brand icon and copyright icon?
I am solely interested in the internal working of such conversion.
Example:
How do compiler understand what "©" is in a blink of second? Is it by looking up a UNICODE table? But if I have 1000000 "©", does my compiler look them up in the table 1000000 times? That is very time consuming, isn't it?
The compiler doesn't see "©". It sees whatever numerical representation of "©" occurs in the source file it's processing. No lookup is needed, because it's already in the form the compiler uses. (Some conversions might be needed if, for example, the source file is in UTF-8 and the compiler uses UTF-32 internally, but such conversions don't require a full Unicode table.)

What is Unicode? and how Encoding works? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Few hours before I was reading a c programming book. While I was reading the book I came across these words, Character encoding and Unicode. Then I started googling for the information about Unicode. Then I came to know that Unicode character set has every character from every language and UTF-8,16,32 can encode the characters listed in unicode character set.
but I was not able to understand how it works.
Does unicode depends upon the operating systems?
How it is related to softwares and programs?
Is UTF-8 is a software that is installed on my computer when i installed operating system?
or Is it related to hardware?
and how a computer encodes the things?
I have found it so much confusing. Please answer me in detail.
I am new to these things, so please keep that in mind while you give me the answer.
thank you.
I have written about this extensively in What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text. Here some highlights:
encodings are plentiful, encodings define how a "character" like "A" can be encoded as bits and bytes
most encodings only specify this for a small number of selected characters; for example all (or at least most) characters needed to write English or Czech; single byte encodings typically support a set of up to 256 characters
Unicode is one large standard effort which has catalogued and specified a number ⟷ character relationship for virtually all characters and symbols of every major language in use, which is hundreds of thousands of characters
UTF-8, 16 and 32 are different sub-standards for how to encode this ginormous catalog of numbers to bytes, each with different size tradeoffs
software needs to specifically support Unicode and its UTF-* encodings, just like it needs to support any other kind of specialized encoding; most of the work is done by the OS these days which exposes supporting functions to an application

i saw musical symbol in html plain text, but any know how exactly it happen? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
♯
♭
I saw this two symbol and i copied it.
try to do any html entities or special character.. but i can't get any result
I can't find any information on google also because this is not a searchable symbols
any one can explain how this flat and sharp musical symbol exist in which standard?
and how to type or generate them and any siblings?
♯
♭
♪
♬
♫
The standard used to define the characters is Unicode
See Unicode Miscellaneous Symbols (includes common music symbols like ♯) and Unicode Musical Symbols (other music symbols) -- I did a search for "unicode musical symbols", there are many more hits.
Happy coding.
See How to enter Unicode characters in Microsoft Windows -- or use the Windows Character Map. However, you need to know the code-point (or general code-point area)
:-) Other operating systems have different input methods and utilities.
A quick google search find the following page which lists entity codes for musical notes:
http://www.danshort.com/HTMLentities/index.php?w=music
It is in Unicode, and you can insert any Unicode character by putting this in HTML/xHTML markup:
♬
Gives ♬, i.e. you put &#x and suffix it with the Hex code of the character (end it with ;)
P.S: This technique is used as the last resort when facing character encoding problems.
explain how this flat and sharp musical symbol exist in which standard?
Unicode
and how to type or generate them and any siblings?
There are utilities for picking characters from unicode distributed with most operating systems.