ELK:filebeat with ANSI encoding - elastic-stack

My Tomcat logs are ANSI encoded (on windows) with Chinese. When I used filebeat to load those logs I found the Chinese was garbled. How can I deal with it ? Does filebeat have a setting to allow loading Chinese from ANSI encoding ?

You need to tell Filebeat what the file's encoding is. There is a prospector option called encoding for this. You can use a tool like Notepad++ that will guess at the encoding or you can examine the file with a hex editor to look at the BOM. Once you know the encoding you can add it to the config file.
filebeat.prospectors:
- paths:
- 'C:\logs\*.log'
encoding: windows-1252
Reference: Encoding Descriptions

Related

VSCODE how to reopen or save file with ASCII encoding

I have text file that contains an SQL that queries a phrase in hebrew.
I wrote the SQL in VSCODE using utf-8 encoding.
Now i have to run this sql using a teradata utility called bteq.
this utility uses ascii encoding.
I tried to reopen or save the file in VSCODE with new encoding but couldn't find an ascii encoding.
In ultra-edit editor there is an ascii encoding and when i copy paste the sql to ultra-edit and save it with ascii encoding the sql runs successfully.
Is there any way to use VSCODE to save the file in ascii encoding?
Update: 2022-05-23
Following the comments:
The reason i need this is that i work with a teradata database.
The tool with which which the SQL is written is called "Teradata Assistant".
Teradata assistant version is: 16.20.0.9 2019-10-25
Database version is: 16.20.53.48
With this tool i can write the sql with hebrew letters and save it as a text file.
When i open the file with teradata assistant the file is displayed correctly with the hebrew letters.
However when i open the same file with VSCODE using UTF-8 all the hebrew letters are replaced with this symbol: �
Finally when i open the file with notepad with ANSI encoding the file is opened with the hebrew letters displayed correctly.
I find it odd that VSCODE can't do something that notepad can do..

How to save to file non-ascii output of program in Powershell?

I want to run program in Powershell and write output to file with UTF-8 encoding.
However I can't write non-ascii characters properly.
I already read many similar questions on Stack overflow, but I still can't find answer.
I tried both PowerShell 5.1.19041.1023 and PowerShell Core 7.1.3, they differently encode output file, but content is broken in the same way.
I tried simple programs in Python and Golang:
(Please assume that I can't change source code of programs)
Python
print('Hello ąćęłńóśźż world')
Results:
python hello.py
Hello ąćęłńóśźż world
python hello.py > file1.txt
Hello ╣Šŕ│˝ˇťč┐ world
python hello.py | out-file -encoding utf8 file2.ext
Hello ╣Šŕ│˝ˇťč┐ world
On cmd:
python hello.py > file3.txt
Hello ����󜟿 world
Golang
package main
import "fmt"
func main() {
fmt.Printf("Hello ąćęłńóśźż world\n")
}
Results:
go run hello.go:
Hello ąćęłńóśźż world
go run hello.go > file4.txt
Hello ─ů─ç─Ö┼é┼ä├│┼Ť┼║┼╝ world
go run hello.go | out-file -encoding utf8 file5.txt
Hello ─ů─ç─Ö┼é┼ä├│┼Ť┼║┼╝ world
On cmd it works ok:
go run hello.go > file6.txt
Hello ąćęłńóśźż world
You should set the OutputEncoding property of the console first.
In PowerShell, enter this line before running your programs:
[Console]::OutputEncoding = [Text.Encoding]::Utf8
You can then use Out-File with your encoding type:
py hello.py | Out-File -Encoding UTF8 file2.ext
go run hello.go | Out-File -Encoding UTF8 file5.txt
Note: These character-encoding problems only plague PowerShell on Windows, in both editions. On Unix-like platforms, UTF-8 is consistently used.[1]
Quicksilver's answer is fundamentally correct:
It is the character encoding stored in [Console]::OutputEncoding that determines how PowerShell decodes text received from external programs[2] - and note that it invariably interprets such output as text (strings).
[Console]::OutputEncoding by default reflects a console's active code page, which itself defaults to the system's active OEM code page, such as 437 (CP437) on US-English systems.
The standard chcp program also reports the active OEM code page, and while it can in principle also be used to change it for the active console (e.g., chcp 65001), this does not work from inside PowerShell, due to .NET caching the encodings.
Therefore, you may have to (temporarily) set [Console]::OutputEncoding to match the actual character encoding used by a given external console program:
While many console programs respect the active console code page (in which case no workarounds are required), some do not, typically in order to provide full Unicode support. Note that you may not notice a problem until you programmatically process such a program's output (meaning: capturing in a variable, sending through the pipeline to another command, redirection to a file), because such a program may detect the case when its stdout is directly connected to the console and may then selectively use full Unicode support for display.
Notable CLIs that do not respect the active console code page:
Python exhibits nonstandard behavior in that it uses the active ANSI code page by default, i.e. the code page normally only used by non-Unicode GUI-subsystem applications.
However, you can use $env:PYTHONUTF8=1 before invoking Python scripts to instruct Python to use UTF-8 instead (which then applies to all Python calls made from the same process); in v3.7+, you can alternatively pass command-line option -X utf8 (case-sensitive) as a per-call opt-in.
Go and also Node.js invariably use UTF-8 encoding.
The following snippet shows how to set [Console]::OutputEncoding temporarily as needed:
# Save the original encoding.
$orig = [Console]::OutputEncoding
# Work with console programs that use UTF-8 encoding,
# such as Go and Node.js
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
# Piping to Write-Output is a dummy operation that forces
# decoding of the external program's output, so that encoding problems would show.
go run hello.go | Write-Output
# Work with console programs that use ANSI encoding, such as Python.
# As noted, the alternative is to configure Python to use UTF-8.
[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP))
python hello.py | Write-Output
# Restore the original encoding.
[Console]::OutputEncoding = $orig
Your own answer provides an effective alternative, but it comes with caveats:
Activating the Use Unicode UTF-8 for worldwide language support feature via Control Panel (or the equivalent registry settings) changes the code pages system-wide, which affects not only all console windows and console applications, but also legacy (non-Unicode) GUI-subsystem applications, given that both the OEM and the ANSI code pages are being set.
Notable side effects include:
Windows PowerShell's default behavior changes, because it uses the ANSI code page both to read source code and as the default encoding for the Get-Content and Set-Content cmdlets.
For instance, existing Windows PowerShell scripts that contain non-ASCII range characters such as é will then misbehave, unless they were saved as UTF-8 with a BOM (or as "Unicode", UTF-16LE, which always has a BOM).
By contrast, PowerShell (Core) v6+ consistently uses (BOM-less) UTF-8 to begin with.
Old console applications may break with 65001 (UTF-8) as the active OEM code page, as they may not be able to handle the variable-length encoding aspect of UTF-8 (a single character can be encoded by up to 4 bytes).
See this answer for more information.
[1] The cross-platform PowerShell (Core) v6+ edition uses (BOM-less) UTF-8 consistently. While it is possible to configure Unix terminals and thereby console (terminal) applications to use a character encoding other than UTF-8, doing so is rare these days - UTF-8 is almost universally used.
[2] By contrast, it is the $OutputEncoding preference variable that determines the encoding used for sending text to external programs, via the pipeline.
Solution is to enable Beta: Use Unicode UTF-8 for worldwide language support as described in What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?
Note: this solution may cause problems with legacy programs. Please read answer by mklement0 and answer by Quciksilver for details and alternative solutions.
Also I found explanation written by Ghisler helpful (source):
If you check this option, Windows will use codepage 65001 (Unicode
UTF-8) instead of the local codepage like 1252 (Western Latin1) for
all plain text files. The advantage is that text files created in e.g.
Russian locale can also be read in other locale like Western or
Central Europe. The downside is that ANSI-Only programs (most older
programs) will show garbage instead of accented characters.
Also Powershell before version 7.1 has a bug when this option is enabled. If you enable it , you may want to upgrade to version 7.1 or later.
I like this solution because it's enough to set it once and it's working. It brings consistent Unix-like UTF-8 behaviour to Windows. I hope I will not see any issues.
How to enable it:
Win+R → intl.cpl
Administrative tab
Click the Change system locale button
Enable Beta: Use Unicode UTF-8 for worldwide language support
Reboot
or alternatively via reg file:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"ACP"="65001"
"OEMCP"="65001"
"MACCP"="65001"

ElFinder and NTFS UTF-16 file names

I use WAMP server and ElFinder 2.x, it works fine except for filenames are encoded in UTF-8 when uploaded, so they look like Список предприятий ВРК123.xlsx in Windows Explorer. It's OK, but it would be nice to be able to copy files with unicode filename to ElFinder's folder via Windows Explorer.
As far as I know NTFS uses UTF-16. nao-pon answered here that one needs to set encoding, locale in connector options for multi-byte encodings. I've tried to set these options to 'UTF-16' and 'ru_RU.UTF-16', but ElFinder cannot load folder at all then and gives Invalid backend configuration. Readable volumes not available error.
UPD: it works fine with 'encoding' => 'CP1251' but well... it doesn't list files with names like 한자.txt.

Which is the Dockerfile encoding?

Defining my Dockerfile I got to this line:
...
MAINTAINER Ramón <ramon#example.com>
...
Which encoding shall I use to save this file?
Shall I escape non ASCII characters?
Considering Docker is done in Go, and Go has native support for utf-8, it is best to save a Dockerfile directly encoded in UTF-8.
That way, all characters (ASCII or not) are supported.
See "Dealing with encodings in Go".
Even though Go has good support for UTF-8 (and minimal support for UTF-16), it has no built-in support for any other encoding.
If you have to use other encodings (e.g. when dealing with user input), you have to use third party packages, like for example go-charset.
Here, it is best if the Dockerfile is directly encoded in UTF-8.
Update July 2016, docker 1.12-rc5 adds:
PR 23372: Support unicode characters in parseWords
PR 23234: Skip UTF-8 BOM bytes from Dockerfile and .dockerignore if exist
You need to set the locale correctly, remove the accent, check the encoding with a basic docker run -it container env and then put a correct encoding, the "Bible" on that is http://jaredmarkell.com/docker-and-locales/

wkhtmltopdf html url encoding (german umlaut)

encoding of the following .pdf conversion on the linux console fails with "ContentNotFoundError"
wkhtmltopdf --page-size A4 --encoding utf-8 --viewport-size 1024x768 http://localhost/möja.html /tmp/test.pdf
Same problem in lynx with enabled UTF-8 charset:
The requested URL /möja.html was not found on this server.
locale settings are in utf-8. Console is typing the german special chars correctly.
LANG=de_DE.UTF-8
LANGUAGE=
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
Accessing the page over the browser and with wkhtmltopdf on the development system (same debian wheezy distribution) is working as expected. pdf's would be created fine without german special chars in the url. I can't find any differences.
Thank you for every hint!
Apparently the server doesn't expect to see UTF-8 encoded characters, it probably expects Latin-1. URLs cannot contain non-ASCII characters to begin with. Encode the umlaut in the URL in percent encoding according to the expected character encoding. The Latin-1 (ISO-8859-1) percent encoded version would be:
http://localhost/k%F6nig.html