wkhtmltopdf html url encoding (german umlaut) - encoding

encoding of the following .pdf conversion on the linux console fails with "ContentNotFoundError"
wkhtmltopdf --page-size A4 --encoding utf-8 --viewport-size 1024x768 http://localhost/möja.html /tmp/test.pdf
Same problem in lynx with enabled UTF-8 charset:
The requested URL /möja.html was not found on this server.
locale settings are in utf-8. Console is typing the german special chars correctly.
LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
Accessing the page over the browser and with wkhtmltopdf on the development system (same debian wheezy distribution) is working as expected. pdf's would be created fine without german special chars in the url. I can't find any differences.
Thank you for every hint!

Apparently the server doesn't expect to see UTF-8 encoded characters, it probably expects Latin-1. URLs cannot contain non-ASCII characters to begin with. Encode the umlaut in the URL in percent encoding according to the expected character encoding. The Latin-1 (ISO-8859-1) percent encoded version would be:
http://localhost/k%F6nig.html

Related

ELK:filebeat with ANSI encoding

My Tomcat logs are ANSI encoded (on windows) with Chinese. When I used filebeat to load those logs I found the Chinese was garbled. How can I deal with it ? Does filebeat have a setting to allow loading Chinese from ANSI encoding ?
You need to tell Filebeat what the file's encoding is. There is a prospector option called encoding for this. You can use a tool like Notepad++ that will guess at the encoding or you can examine the file with a hex editor to look at the BOM. Once you know the encoding you can add it to the config file.
filebeat.prospectors:
- paths:
- 'C:\logs\*.log'
encoding: windows-1252
Reference: Encoding Descriptions

ElFinder and NTFS UTF-16 file names

I use WAMP server and ElFinder 2.x, it works fine except for filenames are encoded in UTF-8 when uploaded, so they look like Список предприятий ВРК123.xlsx in Windows Explorer. It's OK, but it would be nice to be able to copy files with unicode filename to ElFinder's folder via Windows Explorer.
As far as I know NTFS uses UTF-16. nao-pon answered here that one needs to set encoding, locale in connector options for multi-byte encodings. I've tried to set these options to 'UTF-16' and 'ru_RU.UTF-16', but ElFinder cannot load folder at all then and gives Invalid backend configuration. Readable volumes not available error.
UPD: it works fine with 'encoding' => 'CP1251' but well... it doesn't list files with names like 한자.txt.

Convert file to UTF-8 without BOM using iconv on windows 8

I need to convert a bunch of files to:
UTF-8 without BOM
I have installed iconv:
http://www.gnu.org/software/libiconv/
But I can only find the plain UTF-8 option:
iconv --list
ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII
UTF-8
ISO-10646-UCS-2 UCS-2 CSUNICODE
So after I run the conversion the file is now in:
Is it not possible to convert to:
UTF-8 without BOM
using iconv?

Which is the Dockerfile encoding?

Defining my Dockerfile I got to this line:
...
MAINTAINER Ramón <ramon#example.com>
...
Which encoding shall I use to save this file?
Shall I escape non ASCII characters?
Considering Docker is done in Go, and Go has native support for utf-8, it is best to save a Dockerfile directly encoded in UTF-8.
That way, all characters (ASCII or not) are supported.
See "Dealing with encodings in Go".
Even though Go has good support for UTF-8 (and minimal support for UTF-16), it has no built-in support for any other encoding.
If you have to use other encodings (e.g. when dealing with user input), you have to use third party packages, like for example go-charset.
Here, it is best if the Dockerfile is directly encoded in UTF-8.
Update July 2016, docker 1.12-rc5 adds:
PR 23372: Support unicode characters in parseWords
PR 23234: Skip UTF-8 BOM bytes from Dockerfile and .dockerignore if exist
You need to set the locale correctly, remove the accent, check the encoding with a basic docker run -it container env and then put a correct encoding, the "Bible" on that is http://jaredmarkell.com/docker-and-locales/

Postgres using cp1252 encoding?

I have a postgres database that uses UTF-8 as encoding, and has client_encoding set to UTF8 as well. However, when using a script file that should be UTF8-encoded as well, it seems to assume the encoding is really cp1252, and gives me the following error:
FEHLER: Zeichen mit Byte-Folge 0x81 in Kodierung "WIN1252" hat keine Entsprechung in Kodierung "UTF8"
What is wrong here? Shouldn't the DB assume the file is in UTF8, instead of trying to convert it from cp1252? I even added the line
SET client_encoding='UNICODE';
But that didn't change anything (as said, the database is already configured that way...)
I had to manually insert the BOM, then it worked. (What the heck!)