I am using wkhtmltopdf.exe command line tool to render HTML into PDF.
I try to generate PDF with custom header using cyrillic symbols. So I have cmd command like this:
wkhtmltopdf.exe --header-center "Заказ в Австралию — Test" http://localhost/MyPage c:/1.pdf
The issue is that in generated PDF the header looks like this: "?????? ? ??????? - Test"
I tried to add parameter --encoding utf-8 but this not worked for me
Note that my pc is running Windows with the only installed English (US) system locale.
Is there any thoughts how to solve this issue without installing any other system locale?
Old topic but hope my solution will help someone. Here is what helped me (used good clue from lit comment above) :
created .bat file with text like this:
chcp 65001
C:\data\wkhtmltopdf.exe --footer-font-size "10" --footer-center "Сторінка [page] з [topage]" file:///C:\data\12345.html C:\data\12345.pdf
for my server side solution I called ProcessStartInfo for FileName="cmd" and Arguments= "/c "+batchfile
This did the job.
BUT - you still need to make sure that server is switched to your language for non-Unicode chars.
Related
I want to run program in Powershell and write output to file with UTF-8 encoding.
However I can't write non-ascii characters properly.
I already read many similar questions on Stack overflow, but I still can't find answer.
I tried both PowerShell 5.1.19041.1023 and PowerShell Core 7.1.3, they differently encode output file, but content is broken in the same way.
I tried simple programs in Python and Golang:
(Please assume that I can't change source code of programs)
Python
print('Hello ąćęłńóśźż world')
Results:
python hello.py
Hello ąćęłńóśźż world
python hello.py > file1.txt
Hello ╣Šŕ│˝ˇťč┐ world
python hello.py | out-file -encoding utf8 file2.ext
Hello ╣Šŕ│˝ˇťč┐ world
On cmd:
python hello.py > file3.txt
Hello ���� world
Golang
package main
import "fmt"
func main() {
fmt.Printf("Hello ąćęłńóśźż world\n")
}
Results:
go run hello.go:
Hello ąćęłńóśźż world
go run hello.go > file4.txt
Hello ─ů─ç─Ö┼é┼ä├│┼Ť┼║┼╝ world
go run hello.go | out-file -encoding utf8 file5.txt
Hello ─ů─ç─Ö┼é┼ä├│┼Ť┼║┼╝ world
On cmd it works ok:
go run hello.go > file6.txt
Hello ąćęłńóśźż world
You should set the OutputEncoding property of the console first.
In PowerShell, enter this line before running your programs:
[Console]::OutputEncoding = [Text.Encoding]::Utf8
You can then use Out-File with your encoding type:
py hello.py | Out-File -Encoding UTF8 file2.ext
go run hello.go | Out-File -Encoding UTF8 file5.txt
Note: These character-encoding problems only plague PowerShell on Windows, in both editions. On Unix-like platforms, UTF-8 is consistently used.[1]
Quicksilver's answer is fundamentally correct:
It is the character encoding stored in [Console]::OutputEncoding that determines how PowerShell decodes text received from external programs[2] - and note that it invariably interprets such output as text (strings).
[Console]::OutputEncoding by default reflects a console's active code page, which itself defaults to the system's active OEM code page, such as 437 (CP437) on US-English systems.
The standard chcp program also reports the active OEM code page, and while it can in principle also be used to change it for the active console (e.g., chcp 65001), this does not work from inside PowerShell, due to .NET caching the encodings.
Therefore, you may have to (temporarily) set [Console]::OutputEncoding to match the actual character encoding used by a given external console program:
While many console programs respect the active console code page (in which case no workarounds are required), some do not, typically in order to provide full Unicode support. Note that you may not notice a problem until you programmatically process such a program's output (meaning: capturing in a variable, sending through the pipeline to another command, redirection to a file), because such a program may detect the case when its stdout is directly connected to the console and may then selectively use full Unicode support for display.
Notable CLIs that do not respect the active console code page:
Python exhibits nonstandard behavior in that it uses the active ANSI code page by default, i.e. the code page normally only used by non-Unicode GUI-subsystem applications.
However, you can use $env:PYTHONUTF8=1 before invoking Python scripts to instruct Python to use UTF-8 instead (which then applies to all Python calls made from the same process); in v3.7+, you can alternatively pass command-line option -X utf8 (case-sensitive) as a per-call opt-in.
Go and also Node.js invariably use UTF-8 encoding.
The following snippet shows how to set [Console]::OutputEncoding temporarily as needed:
# Save the original encoding.
$orig = [Console]::OutputEncoding
# Work with console programs that use UTF-8 encoding,
# such as Go and Node.js
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
# Piping to Write-Output is a dummy operation that forces
# decoding of the external program's output, so that encoding problems would show.
go run hello.go | Write-Output
# Work with console programs that use ANSI encoding, such as Python.
# As noted, the alternative is to configure Python to use UTF-8.
[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP))
python hello.py | Write-Output
# Restore the original encoding.
[Console]::OutputEncoding = $orig
Your own answer provides an effective alternative, but it comes with caveats:
Activating the Use Unicode UTF-8 for worldwide language support feature via Control Panel (or the equivalent registry settings) changes the code pages system-wide, which affects not only all console windows and console applications, but also legacy (non-Unicode) GUI-subsystem applications, given that both the OEM and the ANSI code pages are being set.
Notable side effects include:
Windows PowerShell's default behavior changes, because it uses the ANSI code page both to read source code and as the default encoding for the Get-Content and Set-Content cmdlets.
For instance, existing Windows PowerShell scripts that contain non-ASCII range characters such as é will then misbehave, unless they were saved as UTF-8 with a BOM (or as "Unicode", UTF-16LE, which always has a BOM).
By contrast, PowerShell (Core) v6+ consistently uses (BOM-less) UTF-8 to begin with.
Old console applications may break with 65001 (UTF-8) as the active OEM code page, as they may not be able to handle the variable-length encoding aspect of UTF-8 (a single character can be encoded by up to 4 bytes).
See this answer for more information.
[1] The cross-platform PowerShell (Core) v6+ edition uses (BOM-less) UTF-8 consistently. While it is possible to configure Unix terminals and thereby console (terminal) applications to use a character encoding other than UTF-8, doing so is rare these days - UTF-8 is almost universally used.
[2] By contrast, it is the $OutputEncoding preference variable that determines the encoding used for sending text to external programs, via the pipeline.
Solution is to enable Beta: Use Unicode UTF-8 for worldwide language support as described in What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?
Note: this solution may cause problems with legacy programs. Please read answer by mklement0 and answer by Quciksilver for details and alternative solutions.
Also I found explanation written by Ghisler helpful (source):
If you check this option, Windows will use codepage 65001 (Unicode
UTF-8) instead of the local codepage like 1252 (Western Latin1) for
all plain text files. The advantage is that text files created in e.g.
Russian locale can also be read in other locale like Western or
Central Europe. The downside is that ANSI-Only programs (most older
programs) will show garbage instead of accented characters.
Also Powershell before version 7.1 has a bug when this option is enabled. If you enable it , you may want to upgrade to version 7.1 or later.
I like this solution because it's enough to set it once and it's working. It brings consistent Unix-like UTF-8 behaviour to Windows. I hope I will not see any issues.
How to enable it:
Win+R → intl.cpl
Administrative tab
Click the Change system locale button
Enable Beta: Use Unicode UTF-8 for worldwide language support
Reboot
or alternatively via reg file:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"ACP"="65001"
"OEMCP"="65001"
"MACCP"="65001"
I'm adding the following line -classpath/p ${installer:sys.userHome}/.comput/updates/latest.jar to the vmoption file. (Tried both options: via installer 'Add VM option' action and via launcher config).
Works pretty fine with ASCII user name (with spaces as well), but fails with non-ascii user names (I'm testing with Russian). The vmoption file looks fine to me: the path is correct and has the right encoding: CP 1251 for my case:
However the path passed to JVM seems to have incorrectly decoded characters: On the attached screen you may see the actual path passed to JVM (checked via YourKit) from Install4J launcher:
and you may also compare it with the screen when the non-ascii path is passed via command prompt:
The only workaround I have found is to substitute the path with 8.3 Windows path, but converting to it on pure Java seems very error prone to me.
Appeciate your help very much!
Edit: Note this is an older question, from a time when AWS CLI was v1 - as noted in comments, there are other likely better solutions with v2
I'm using AWS CLI on Windows to query items from DynamoDb. Some of these items include non-ASCII characters.
When the query hits those items, it dies with an error
'charmap' codec can't encode character u'u010d' in position....
After hours of searching, I finally stumbled across a hackish workaround; under the AWSCLI\encodings directory, I copied utf_8.pyc over cp1252.pyc. This allows me to continue, but of course is ugly.
Before resorting to that, I also tried setting environment variables such as LANG, LC_ALL, LC_CTYPE to various permutations of en-US.UTF-8 or similar, all with no effect that I could see.
Does anyone know how (or is it even possible) to tell AWS CLI to use a particular encoding?
Since you're using the command line interface, a change to the terminal's encoding scheme should fix the issue.
Type:
chcp 65001
in the console (for UTF-8; you may also try different encodings) and retry your operations.
Maybe it could help as well - the issue with translation from AWS and storing results to the file (or powershell variable):
With error:
aws translate translate-text --text "Investigation" --source-language-code "auto" --target-language-code "PL" >> a.txt
'charmap' codec can't encode character '\u015a' in position 1: character maps to
Adding env. variable fixes the problem
set PYTHONIOENCODING=UTF-8
aws translate translate-text --text "Investigation" --source-language-code "auto" --target-language-code "PL" >> a.txt
The same in powershell:
PS C:\Users\???\Documents> $aws = aws translate translate-text --text "Request" --source-language-code "auto"--target-language-code "PL"
'charmap' codec can't encode character '\u015b' in position 4: character maps to <undefined>
PS C:\Users\???\Documents> exit
C:\Users\???\Documents>set PYTHONIOENCODING=UTF-8
C:\Users\???\Documents>powershell
Windows PowerShell
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
PS C:\Users\???\Documents> $aws = aws translate translate-text --text "Request" --source-language-code "auto"
--target-language-code "PL"
PS C:\Users\???\Documents> $aws
{
"TranslatedText": "Prośba",
"SourceLanguageCode": "en",
"TargetLanguageCode": "pl"
}
I've reinstalled AWS CLI using upgraded MSI installers which now use Python 3 instead of Python 2 and the unknown encoding error is now gone.
I am using git bash on windows 10 and set PYTHONIOENCODING=UTF-8 was not actually did env change. Used export PYTHONIOENCODING=UTF-8 then overcame the charmap error
for Windows 10 and cli is installed by python
Error:
'charmap' codec can't encode characters in position XX-XX: character maps to
Solution: run following command in command-line
set PYTHONIOENCODING=UTF-8
I have a similar question to this:
ColdFusion, CFDirectory and the French
which was not given a satisfactory answer.
We have upgraded from Coldfusion 9 to Coldfusion 11. So far no major problems except the following:
When using CFdirectory to display file names that contain non ASCII characters in their names (eg: accents, umlauts) we get to see the file name with replacement characters � instead of the correct UTF equivalent. For example a file named L’État, c’est moi.pdf is displayed as L�����tat, c���est moi.pdf.
We are confident that this is a Coldfusion issue as nothing has changed but the Coldfusion version. With Coldfusion 9 CFdirectory worked OK when listing the same accented filenames. Our OS is Redhat 7.0 and the file names are also displayed correctly on the terminal with the ls command. I have also created a quick PHP script to see if PHP can read correctly the directory with the "readdir" command and there no problems there either, filenames are rendered correctly.
So I believe this has to be a Coldfusion 11 issue. I have added the -Dfile.encoding=UTF-8 -Dencoding=UTF-8 parameters in the JVM settings from the Coldfusion administrator server interface but it made no difference.
Any suggestions on how to rectify this would be appreciated.
example of code used follows:
<cfdirectory
action="list"
directory="#ExpandPath( './' )#/pdfs"
listinfo="name"
name="qFile"
/>
<cfdump
var="#qFile#"
label="All Files"
/>
Have you tried setting the cfprocessingdirective tag?
<cfprocessingdirective pageencoding="utf-8">
CF 11 WikiDocs
Also, In the Chrome Network Inspector, make sure the encoding is being returned correctly. Eg:
Content-Type:text/html; charset=UTF-8
If your environment is Linux, you need to have a clean UTF-8 configuration.
Please have a look here.
I had the same problem, I just add into the file ~/.bashrc these lines:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
After that, don't forget to restart your Coldfusion Server
sudo /opt/coldfusion11/cfusion/bin/coldfusion restart
Please see: Why are certain characters not being injected correctly to SQL Server from a CFQUERY?
Make sure your file is saved with encoding Unicode UTF-8.
Also make sure your JVM arguments will process that as well. Admin > Server Settings > Java and JVM. Add " -Dfile.encoding=UTF-8" to the Arguments.
I had the same problem this solved my bug
/.bashrc
LC_ALL="de_DE.UTF-8"
on linux and after change restart coldfusion application
I've created a UTF8 script for PowerShell with non-ascii characters.
characters.ps1:
Write-Host "ç â ã á à"
When the script is run in PowerShell console, it outputs wrong characters.
However, if I write the chars directly in the console, they are shown as expected:
Does anyone knows what causes that behavior?
The problem arised from a script I wrote who has hardcoded paths which include non-ascii characters. When I try to pass the path as argument to cmdlets (in the case I was gonna robocopy a folder) the command fails because it cannot find the path (which is output wrongly in the screen).
Changing the encoding of the script to UTF-8 with BOM solved the issue.
I was using SublimeText with the EncodingHelper plugin to control the character-set of the script. It was set correctly to UTF8.
I changed the encoding of the script in SublimeText to "UTF-8 with BOM" and the output was shown correctly.
I created the same script with Notepad++, which defaults to "UTF-8 with BOM", and the string was shown correctly in the console.
I changed the encoding of the script in Notepad++ to "UTF-8 without BOM" and it was shown incorrectly.
It seems PowerShell cannot guess correctly the encoding of UTF-8 files with no BOM.
In my case the problem was caused by creating a new PowerShell script with Visual Studio Code which has the default encoding of UTF-8 without BOM. Set the encoding to "Windows 1252" solved the problem.
It seems that PowerShell can't handle UTF-8 without BOM, it needs "Windows 1252" or "UTF8 with BOM" encodings.
try this before invoking your script :
$OutputEncoding = [Console]::OutputEncoding
There is a reliable way to detect utf8nobom (https://unicodebook.readthedocs.io/guess_encoding.html). Like a lot of other little things, this seems to work better in PS 6. Even my beloved emacs 25 for windows gets the encoding wrong.
PS C:\users\admin> pwsh
PowerShell 6.1.0
Copyright (c) Microsoft Corporation. All rights reserved.
https://aka.ms/pscore6-docs
Type 'help' to get help.
PS C:\users\admin> "write-host 'ç â ã á à'" | set-content -Encoding utf8NoBOM accent.ps1
PS C:\users\admin> .\accent
ç â ã á à