What languages does the character encoding UTF-8 support? [closed] - character

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
What languages does UTF-8 support?
And how many languages does the UTF-8 support?

See the page Supported Scripts on unicode.org. UTF-8 supports all Unicode characters.
Note that Unicode defines character encodings, not languages.
The Unicode Standard encodes scripts rather than languages per se. ...

UTF-8 is suppose to represent any Unicode character.
http://en.wikipedia.org/wiki/UTF-8

Related

ISO-8859-9/Latin-9 encoding [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I know it exist ISO-8859-9/Latin-5 or ISO-8859-15/Latin-9, but recently I had to manage some messages encoded with ISO-8859-9/Latin-9 format.
What does it exactly mean?
There is ISO-8859-9 which is called Latin-5.
And there is ISO-8859-15 which is called Latin-9.
Yes, it is confusing. In my opinion it's simplest to always only use the ISO-8859-n moniker. That avoids potential confusions.
So "ISO-8859-9/Latin-9" is probably a typo (or someone wrongly thought that the suffix is identical for the "ISO-8859-" and the "Latin-" prefix).
Depending on the source of the data, you can guess which one they meant. ISO-8859-9 is used for Turkish text and ISO-8859-15 is basically the modern replacement for ISO-8859-1 (covering most of Western Europe, mostly used because it has the € symbol).
Source: ISO/IEC 8859 Wiki page.

How does compiler understand Unicode characters so quickly? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have made a document based program lately.
But what intrigues me that how can a compiler(in my case, objective-c) convert any character into Unicode so fast while these characters are only visual presentations.
I think maybe A~Z and all other common characters can be converted from ASCII to Unicode very easily. What about other special character such as brand icon and copyright icon?
I am solely interested in the internal working of such conversion.
Example:
How do compiler understand what "©" is in a blink of second? Is it by looking up a UNICODE table? But if I have 1000000 "©", does my compiler look them up in the table 1000000 times? That is very time consuming, isn't it?
The compiler doesn't see "©". It sees whatever numerical representation of "©" occurs in the source file it's processing. No lookup is needed, because it's already in the form the compiler uses. (Some conversions might be needed if, for example, the source file is in UTF-8 and the compiler uses UTF-32 internally, but such conversions don't require a full Unicode table.)

How to convert character to unicode? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have this character.
&#8211
How to convert this character to unicode?
Sorry if it is a silly question.
It's not a silly question, character encoding can be tricky to get your head around. I highly recommend reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) (I'm sure you can guess the topic).
Unicode itself isn't an encoding, it's a very long list of characters and code points. What I'm guessing you want to do is display the dash character in some way. Where are you wanting to display or store the data? If it's in a browser, then that representation should work as that's the HTML encoded version. If you want to store it in a database then you'll need to convert that encoded version to a string and then convert that string to whatever encoding the database is using.
Take a look at this source has the encoding in different formats
http://www.fileformat.info/info/unicode/char/2013/index.htm
but each language has its own rules on how to write this in a string/char literal

Why does windows notepad give possibility to save document in unicode and in utf-8? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
Utf-8 is " is a variable-width encoding that can represent every character in the Unicode character set" (wikipedia), unicode is "standard for the consistent encoding, representation and handling of text" (wikipedia). They're difference things. Why does windows notepad give possibility to save document in unicode and utf-8? How can I compare two difference things?
To simplify,
Unicode says what number should represent each character.
UTF-8 says how to arange the bits to form different strings of unicode values.
According to this thread, what Unicode means in notepad is UTF-16 Little Endian (UTF-16LE) which is another way arranging the bits in order to form strings of Unicode values.

GB18030 vs Big5 Chinese character encodings sizewise [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
We've two encodings available for Chinese characters, GB18030 and Big5 for Chinese Simplified and Chinese Traditional respectively.
How many byte(s)/octet(s) a single Chinese character would take in each encoding?
Going by Wikipedia:
GB_18030 - Guójiā Biāozhǔn (国家标准) is a 4 octets(bytes) encoding scheme. Hence, every character should take 4 octets. Same is said on GB18030 - New Chinese Encoding Standard
Big-5 or Big5 is a 2 octets(bytes) encoding scheme. Here every character takes 2 octets.