The problem is that when I save text into a SQL Server database, I have problems with charset / encoding.
The text has ä ö ü (German umlaut) letters. In myLittleAdmin, I see � instead of the proper ä ö ü letters.
My host just gives me access to database by myLittleAdmin. How and where can I configure the charset / encoding?
Related
I have a legacy postgres 8.4 database that I am pulling foreign addresses out of. The encoding on the database is UTF-8 and most of the special characters appear fine as normal readable characters, e.g: Á, É, Í, Ó, Ü
However there are thousands of records that use some other encoding for these characters, they all start with a non-printable STX char and then the sequence: 6#x followed by 1 or 2 hexadecimal chars.
E.g: CUAUHT6#xc9;MOC which should be: CUAUHTÉMOC
I tried using on-line "decoders" to identify this gibberish but did not have any luck. However, I was able to identify the following based on the city names and referencing postal codes:
existing should be
-------- ---------
6#xc1; Á
6#xc9; É
6#xcd; Í
6#xd3; Ó
6#xdc; Ü
6#xd; " " (space)
The data appears to be stored in the database like this. Though admittedly, I am new to postgres so could be missing something fairly obvious.
Short of finding and replacing all instances, is this an identifiable encoding that has a standard decode function?
I'm having trouble converting UTF-8 characters in a DB2 database to LaTeX format.
I have a DB2 database which is in codeset UTF-8 and codepage 1208. I have a field "lastname" that has some names that contain the East European special characters š ("s" with caron, Unicode code point U+0161, HTML entity š) and á ("a" with acute, Unicode code point U+00E1, HTML entity á).
My environment in the shell is LANG de_DE.utf8. I read the database field in Perl with the DBI module and want to convert the name to LaTeX format for printing, but it doesn't work for the character š.
I want to convert these characters as follows:
á -> \'{a}
š -> \v{s}
I'm able to convert á but not š with the following code ($tmp contains áš):
print TeX::Encode::encode('latex',$tmp);
This gives \'a?. \'a is correct but ? for the š is not.
When I save the field directly to a file and look at it with a hex editor it shows e1 1a. e1 is correct but 1a isn't (according to Latin-2 it should be b9).
Can someone please help me convert these East European names from a UTF-8 database to a universal LaTeX format for printing?
I tried to convert the text file to resx file using powershell command as below,
Resgen myFile.txt myFile.resx
myFile.txt file consist of German language.
For example: text file consist of “Längsseitenzufuhr” word.
After converted to resx file, it consist the word “Längsseitenzufuhr” as “L�ngsseitenzufuhr” in resx file.
“ä” not supporting in resx file.
I think its encoding issue.
Also i tried by changing the text file encoding as utf-8. But same issue happening.
Is there any way to get correct word in resx?
As a German i know this problem. We have an international notation for our language specific characters. Here the List:
ä = ae
ö = oe
ü = ue
ß = ss (exists as lower case only)
Ä = Ae
Ö = Oe
Ü = Ue
If possible just replace the German characters on this way, it is allowed in our language.
I found some articles like this here to solve the problem on a technical way. To convert the file to UTF-8 seems to right.
What is an example of a character encoding which is not compatible with ASCII and why isn't it?
Also, what are other encoding which have upward compatibility with ASCII (except UTF and ISO8859, which I already know) and for what reason?
There are EBCDIC-based encodings that are not compatible with ASCII. For example, I recently encountered an email that was encoded using CP1026, aka EBCDIC 1026. If you look at its character table, letters and numbers are encoded at very different offsets than in ASCII. This was throwing off my email parser particularly because LF is encoded as 0x25 instead of as 0x0A in ASCII.
I use go.text in my project
https://godoc.org/code.google.com/p/go.text/encoding
I do not understand why it is missing iso-8859-1?
I know I can easily transcode it byte -> rune -> utf8
Unmarshal an ISO-8859-1 XML input in Go
But I wonder if there is some encoding in go.text that is iso-8859-1 but named differently. I know it has following names.
ISO_8859-1:1987
ISO-8859-1
iso-ir-100
ISO_8859-1
latin1
l1
IBM819
CP819
csISOLatin1
You can use Windows1252 in place of iso-8859-1.
This character encoding is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (hex) range
http://en.wikipedia.org/wiki/Windows-1252
ISO-8859-1 assigns several control codes in this range. Windows-1252 has several characters, punctuation, arithmetic and business symbols assigned to these code points.
There is a chart with the differences here:
http://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html