Change method of encoding characters in postgreSQL [closed] - postgresql

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have in my database records which contains Polish characters like: ś, ć, ż, ź.
It happen to be a problem for me when I try to execute some of the SELECT statements..
Because I get my text but instead of characters I wrote above I get: <c4><85>.
I bet there is a way I can change encoding for example for utf-8, but how can I do that for simple query like select * from table?

As you've indicated this is on the console, you must first check your console encoding before starting psql.
See Unicode characters in Windows command line - how? for details of how to do this in windows.
This must be done because even if you do get psql to read / write in UTF8 your console won't necessarily understand the characters and will not display them correctly.
Once you've confirmed that your console can accept UTF-8 Encoding then makesure that psql has picked this encoding up:
show client_encoding;
client_encoding
-----------------
UTF8
(1 row)
If that doesn't show UTF-8 then you can use:
set client_encoding = UTF8;
As a general rule; if your program is expecting to use UTF8 then there is no harm in setting the client encoding blindly (without checking what it is to start with).
http://www.postgresql.org/docs/current/static/multibyte.html
Note:
The above link is for the current version. As the OP has asked for version 8.0, here is the link for the 8.0 manual:
See http://www.postgresql.org/docs/8.0/static/multibyte.html

Related

Charset comparison [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I need urgent help. I can't compare charset strings. A string written to a database table1 is utf-8 charset but looks still strange: SADI
However a string written to table2 in the same database is SADI which is normal.
whenever I compare both, it gives false.
Any idea how comparison can be made? (actually comparison should give true result)
Any idea how I can insert SADI as SADI to a database.
Either will be a solution hopefully.
In your strings, SADI is standard ASCII string, but SADI is using full-width Unicode characters.
For example, S is U+FF33 'FULLWIDTH LATIN CAPITAL LETTER S' (UTF-8: 0xEF 0xBC 0xB3),
but S is standard ASCII U+0053 'LATIN CAPITAL LETTER S' (UTF-8 0x53).
Other characters are also similar extended Unicode characters, which look like standard Latin script, but in reality are not.
How did they get there - that's a good question. Probably somebody got really creative and copy-pasted something from Word? Who knows.
You can convert these strange characters back to normal ones by applying Unicode NFKC (Unicode Normalization Form KC) by using this Perl script as a filter (it accepts UTF-8 and outputs normalized UTF-8):
use Unicode::Normalize;
binmode STDIN, ':utf8';
binmode STDOUT, ':utf8';
while(<>) { print NFKC($_); }
In php:
$result = Normalizer::normalize( $str, Normalizer::FORM_KC );
Requires the intl extension

How does one decode Base64? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
As seen here, they have the line ZG9udGJlYWhhdGVyc3RhcnR1cCtoYWNrZXJuZXdzQGdtYWlsLmNvbQ==.
How would one go about decoding this line of Base64?
Base64-decode it. For example, put it in this online decoder: http://www.opinionatedgeek.com/dotnet/tools/base64decode/
BTW this is not encryption, it's encoding.
This is a simple base64 encoding, one way to decode it is to use openssl
echo 'ZG9udGJlYWhhdGVyc3RhcnR1cCtoYWNrZXJuZXdzQGdtYWlsLmNvbQ==' | openssl base64 -d
Use a base64 decoder. Or - specify a language you would like to use and I can give you some example code.
BTW: I decoded this using this online decoder:
http://www.convertstring.com/EncodeDecode/Base64Decode
It decodes to
dontbeahaterstartup+hackernews#gmail.com
Base64 encoding is explained here: Base64 decoder
ZG9udGJlYWhhdGVyc3RhcnR1cCtoYWNrZXJuZXdzQGdtYWlsLmNvbQ== => dontbeahaterstartup+hackernews#gmail.com

What's the Unicode code point for '¿'? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
Google hasn't been a help.
A quick search on Google for "unicode for upside down question mark" led me to a Wikipedia article, which stated that
The inverted question mark (¿) corresponds to Unicode code-point 191 (U+00BF)
¿ɹoɟ buıʞooן ǝɹǝʍ noʎ ʇɐɥʍ ʇɐɥʇ sı
If you want to obtain the Unicode value of a character you can use this simple Javascript :
javascript:alert("¿".charCodeAt(0))
This will alert the Unicode value of the character. If you want to use it in HTML, the synthax is & #191; (without space between & and #) where 191 is the Unicode number of your character.
I use this site as search tool for unicode characters. Here are the search results for ¿. It has one result: Unicode Character 'INVERTED QUESTION MARK' (U+00BF).
Useful site.
According to Ubuntu's gucharmap:
U+00BF INVERTED QUESTION MARK
General Character Properties
In Unicode since: 1.1
Unicode category: Punctuation, Other
Various Useful Representations
UTF-8: 0xC2 0xBF
UTF-16: 0x00BF
C octal escaped UTF-8: \302\277
XML decimal entity: ¿
Annotations and Cross References
Alias names:
• turned question mark
Notes:
• Spanish
See also:
• U+003F QUESTION MARK
• U+2E2E REVERSED QUESTION MARK
Google is a help ALWAYS:
http://www.google.com/search?hl=pl&q=unicode+for+inverted+question+mark&aq=f&aqi=&aql=&oq=&gs_rfai=
and:
http://www.fileformat.info/info/unicode/char/bf/index.htm
answer: U+00BF
If you know Java you can print it like this:
$ cat UnicodeTest.java
public class UnicodeTest {
public static void main( String [] args ) {
System.out.println( ( int ) '¿' );
}
}
$ javac -encoding UTF8 UnicodeTest.java
$ java UnicodeTest
191
Answer 191
Java's characters are unicode.
BTW, ¡That's not an upside down question mark! it is an "opening" question mark. It is just not everyone uses it, just like a '(' is not an upside parenthesis.
Unicode table might be helpful 00A01F.

What's 8BITMIME? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
what's 8bitmime? waht's the defference of 7bit and 8bit?
How to understand them?
SMTP was originally specified using the pure ASCII character set. What a lot of people forget (or never get taught) is that the original ASCII is a 7-bit character set.
With much of the computing world using octets (8-bit bytes) or multiples thereof, some applications started, very unwisely, using the 8th bit for internal use, and so SMTP never got the chance to easily move to an 8-bit character set.
8BITMIME, which you can read about in excruciating detail in RFC 1652, or in a decent summary at wikipedia, is a way for SMTP servers that support it to transmit email using 8-bit character sets in a standards-compliant way that won't break old servers.
In practice, most of the concerns that led to this sort of thing are obsolete, and a lot of SMTP servers will even happily send/receive 8-bit character sets in "plain" SMTP mode (though that's not exactly the wisest decision in the world, either), but we're left with this legacy because, well, "if it ain't broke" (for very strict definitions of "broke")...

Why are there two slashes - forward and backward? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I'm totally confused which one to use and when, first thing I do when something goes wrong in code with a slash is replace the one with other so my test cases double with one for / and one for \ .Help me to get the logic behind slashes.
From the wikipedia article about the backslash:
Bob Bemer introduced the \ character
into ASCII, on September 18, 1961, as
the result of character frequency
studies. In particular the \ was
introduced so that the ALGOL boolean
operators "∧" (AND) and "∨" (OR) could
be composed in ASCII as "/\" and "/"
respectively.[4] Both these operators
were included in early versions of the
C programming language supplied with
Unix V6 , Unix V7 and more currently
BSD 2.11.
/ is generally used to denote division as in 10/2 meaning 10 divided by 2. \ is generally used as an escape character as in \t or \n representing a tab and a newline character respectively.
There's nothing like a "forward slash". There's a "slash" / and a "backslash" \.
There's a long and IMHO ilarious discussion about that on the xkcd forum
One More Thing....
The Forward Slash / is Used in *nix To Navigate to the Filesystem...
Like .... /root/home/vs4vijay
and The BackSlash \is Used In Windows ...
Like ..... F:\Games\CounterStrike