What's 8BITMIME? [closed] - email

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
what's 8bitmime? waht's the defference of 7bit and 8bit?
How to understand them?

SMTP was originally specified using the pure ASCII character set. What a lot of people forget (or never get taught) is that the original ASCII is a 7-bit character set.
With much of the computing world using octets (8-bit bytes) or multiples thereof, some applications started, very unwisely, using the 8th bit for internal use, and so SMTP never got the chance to easily move to an 8-bit character set.
8BITMIME, which you can read about in excruciating detail in RFC 1652, or in a decent summary at wikipedia, is a way for SMTP servers that support it to transmit email using 8-bit character sets in a standards-compliant way that won't break old servers.
In practice, most of the concerns that led to this sort of thing are obsolete, and a lot of SMTP servers will even happily send/receive 8-bit character sets in "plain" SMTP mode (though that's not exactly the wisest decision in the world, either), but we're left with this legacy because, well, "if it ain't broke" (for very strict definitions of "broke")...

Related

Change method of encoding characters in postgreSQL [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have in my database records which contains Polish characters like: ś, ć, ż, ź.
It happen to be a problem for me when I try to execute some of the SELECT statements..
Because I get my text but instead of characters I wrote above I get: <c4><85>.
I bet there is a way I can change encoding for example for utf-8, but how can I do that for simple query like select * from table?
As you've indicated this is on the console, you must first check your console encoding before starting psql.
See Unicode characters in Windows command line - how? for details of how to do this in windows.
This must be done because even if you do get psql to read / write in UTF8 your console won't necessarily understand the characters and will not display them correctly.
Once you've confirmed that your console can accept UTF-8 Encoding then makesure that psql has picked this encoding up:
show client_encoding;
client_encoding
-----------------
UTF8
(1 row)
If that doesn't show UTF-8 then you can use:
set client_encoding = UTF8;
As a general rule; if your program is expecting to use UTF8 then there is no harm in setting the client encoding blindly (without checking what it is to start with).
http://www.postgresql.org/docs/current/static/multibyte.html
Note:
The above link is for the current version. As the OP has asked for version 8.0, here is the link for the 8.0 manual:
See http://www.postgresql.org/docs/8.0/static/multibyte.html

Charset comparison [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I need urgent help. I can't compare charset strings. A string written to a database table1 is utf-8 charset but looks still strange: SADI
However a string written to table2 in the same database is SADI which is normal.
whenever I compare both, it gives false.
Any idea how comparison can be made? (actually comparison should give true result)
Any idea how I can insert SADI as SADI to a database.
Either will be a solution hopefully.
In your strings, SADI is standard ASCII string, but SADI is using full-width Unicode characters.
For example, S is U+FF33 'FULLWIDTH LATIN CAPITAL LETTER S' (UTF-8: 0xEF 0xBC 0xB3),
but S is standard ASCII U+0053 'LATIN CAPITAL LETTER S' (UTF-8 0x53).
Other characters are also similar extended Unicode characters, which look like standard Latin script, but in reality are not.
How did they get there - that's a good question. Probably somebody got really creative and copy-pasted something from Word? Who knows.
You can convert these strange characters back to normal ones by applying Unicode NFKC (Unicode Normalization Form KC) by using this Perl script as a filter (it accepts UTF-8 and outputs normalized UTF-8):
use Unicode::Normalize;
binmode STDIN, ':utf8';
binmode STDOUT, ':utf8';
while(<>) { print NFKC($_); }
In php:
$result = Normalizer::normalize( $str, Normalizer::FORM_KC );
Requires the intl extension

Why do we need both UCS and Unicode character sets? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I guess the codepoints of UCS and Unicode are the same, am I right?
In that case, why do we need two standards (UCS and Unicode)?
They are not two standards. The Universal Character Set (UCS) is not a standard but something defined in a standard, namely ISO 10646. This should not be confused with encodings, such as UCS-2.
It is difficult to guess whether you actually mean different encodings or different standards. But regarding the latter, Unicode and ISO 10646 were originally two distinct standardization efforts with different goals and strategies. They were however harmonized in the early 1990s to avoid all the mess resulting from two different standards. They have been coordinated so that the code points are indeed the same.
They were kept distinct, though, partly because Unicode is defined by an industry consortium that can work flexibly and has great interest in standardizing things beyond simple code point assignments. The Unicode Standard defines a large number of principles and processing rules, not just the characters. ISO 10646 is a formal standard that can be referenced in standards and other documents of the ISO and its members.
The codepoints are the same but there are some differences.
From the Wikipedia entry about the differences between Unicode and ISO 10646 (i.e. UCS):
The difference between them is that Unicode adds rules and specifications that are outside the scope of ISO 10646. ISO 10646 is a simple character map, an extension of previous standards like ISO 8859. In contrast, Unicode adds rules for collation, normalization of forms, and the bidirectional algorithm for scripts like Hebrew and Arabic
You might find useful to read the Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
I think the differences come from the way the code points are encoded. UCS-x uses a fixed amount of bytes to encode a code point. For example, UCS-2 uses two bytes. However, UCS-2 cannot encode code points that would require over 2 bytes. On the other hand, UTF uses variable amount of bytes for encoding. For example, UTF-8 uses at least one byte (for ascii characters) but uses more bytes if the character is outside the ascii range.

url decode for unicode [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
How can I decode unicode characters from an URL?. I specified response.charset="UTF-8" in my request, and I received unicode characters like %e3%81%a4%e3%82%8c%e3%. How can I convert these to something I can display on my form?
RFC 3986 specifies how to interpret this. You first decode the percent-escaped byte values in the standard way. Then you interpret the byte stream as UTF-8 to reconstruct the characters. You can find more information here.

Why are there two slashes - forward and backward? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I'm totally confused which one to use and when, first thing I do when something goes wrong in code with a slash is replace the one with other so my test cases double with one for / and one for \ .Help me to get the logic behind slashes.
From the wikipedia article about the backslash:
Bob Bemer introduced the \ character
into ASCII, on September 18, 1961, as
the result of character frequency
studies. In particular the \ was
introduced so that the ALGOL boolean
operators "∧" (AND) and "∨" (OR) could
be composed in ASCII as "/\" and "/"
respectively.[4] Both these operators
were included in early versions of the
C programming language supplied with
Unix V6 , Unix V7 and more currently
BSD 2.11.
/ is generally used to denote division as in 10/2 meaning 10 divided by 2. \ is generally used as an escape character as in \t or \n representing a tab and a newline character respectively.
There's nothing like a "forward slash". There's a "slash" / and a "backslash" \.
There's a long and IMHO ilarious discussion about that on the xkcd forum
One More Thing....
The Forward Slash / is Used in *nix To Navigate to the Filesystem...
Like .... /root/home/vs4vijay
and The BackSlash \is Used In Windows ...
Like ..... F:\Games\CounterStrike