Edit: Note this is an older question, from a time when AWS CLI was v1 - as noted in comments, there are other likely better solutions with v2
I'm using AWS CLI on Windows to query items from DynamoDb. Some of these items include non-ASCII characters.
When the query hits those items, it dies with an error
'charmap' codec can't encode character u'u010d' in position....
After hours of searching, I finally stumbled across a hackish workaround; under the AWSCLI\encodings directory, I copied utf_8.pyc over cp1252.pyc. This allows me to continue, but of course is ugly.
Before resorting to that, I also tried setting environment variables such as LANG, LC_ALL, LC_CTYPE to various permutations of en-US.UTF-8 or similar, all with no effect that I could see.
Does anyone know how (or is it even possible) to tell AWS CLI to use a particular encoding?
Since you're using the command line interface, a change to the terminal's encoding scheme should fix the issue.
Type:
chcp 65001
in the console (for UTF-8; you may also try different encodings) and retry your operations.
Maybe it could help as well - the issue with translation from AWS and storing results to the file (or powershell variable):
With error:
aws translate translate-text --text "Investigation" --source-language-code "auto" --target-language-code "PL" >> a.txt
'charmap' codec can't encode character '\u015a' in position 1: character maps to
Adding env. variable fixes the problem
set PYTHONIOENCODING=UTF-8
aws translate translate-text --text "Investigation" --source-language-code "auto" --target-language-code "PL" >> a.txt
The same in powershell:
PS C:\Users\???\Documents> $aws = aws translate translate-text --text "Request" --source-language-code "auto"--target-language-code "PL"
'charmap' codec can't encode character '\u015b' in position 4: character maps to <undefined>
PS C:\Users\???\Documents> exit
C:\Users\???\Documents>set PYTHONIOENCODING=UTF-8
C:\Users\???\Documents>powershell
Windows PowerShell
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
PS C:\Users\???\Documents> $aws = aws translate translate-text --text "Request" --source-language-code "auto"
--target-language-code "PL"
PS C:\Users\???\Documents> $aws
{
"TranslatedText": "Prośba",
"SourceLanguageCode": "en",
"TargetLanguageCode": "pl"
}
I've reinstalled AWS CLI using upgraded MSI installers which now use Python 3 instead of Python 2 and the unknown encoding error is now gone.
I am using git bash on windows 10 and set PYTHONIOENCODING=UTF-8 was not actually did env change. Used export PYTHONIOENCODING=UTF-8 then overcame the charmap error
for Windows 10 and cli is installed by python
Error:
'charmap' codec can't encode characters in position XX-XX: character maps to
Solution: run following command in command-line
set PYTHONIOENCODING=UTF-8
Related
I want to run program in Powershell and write output to file with UTF-8 encoding.
However I can't write non-ascii characters properly.
I already read many similar questions on Stack overflow, but I still can't find answer.
I tried both PowerShell 5.1.19041.1023 and PowerShell Core 7.1.3, they differently encode output file, but content is broken in the same way.
I tried simple programs in Python and Golang:
(Please assume that I can't change source code of programs)
Python
print('Hello ąćęłńóśźż world')
Results:
python hello.py
Hello ąćęłńóśźż world
python hello.py > file1.txt
Hello ╣Šŕ│˝ˇťč┐ world
python hello.py | out-file -encoding utf8 file2.ext
Hello ╣Šŕ│˝ˇťč┐ world
On cmd:
python hello.py > file3.txt
Hello ���� world
Golang
package main
import "fmt"
func main() {
fmt.Printf("Hello ąćęłńóśźż world\n")
}
Results:
go run hello.go:
Hello ąćęłńóśźż world
go run hello.go > file4.txt
Hello ─ů─ç─Ö┼é┼ä├│┼Ť┼║┼╝ world
go run hello.go | out-file -encoding utf8 file5.txt
Hello ─ů─ç─Ö┼é┼ä├│┼Ť┼║┼╝ world
On cmd it works ok:
go run hello.go > file6.txt
Hello ąćęłńóśźż world
You should set the OutputEncoding property of the console first.
In PowerShell, enter this line before running your programs:
[Console]::OutputEncoding = [Text.Encoding]::Utf8
You can then use Out-File with your encoding type:
py hello.py | Out-File -Encoding UTF8 file2.ext
go run hello.go | Out-File -Encoding UTF8 file5.txt
Note: These character-encoding problems only plague PowerShell on Windows, in both editions. On Unix-like platforms, UTF-8 is consistently used.[1]
Quicksilver's answer is fundamentally correct:
It is the character encoding stored in [Console]::OutputEncoding that determines how PowerShell decodes text received from external programs[2] - and note that it invariably interprets such output as text (strings).
[Console]::OutputEncoding by default reflects a console's active code page, which itself defaults to the system's active OEM code page, such as 437 (CP437) on US-English systems.
The standard chcp program also reports the active OEM code page, and while it can in principle also be used to change it for the active console (e.g., chcp 65001), this does not work from inside PowerShell, due to .NET caching the encodings.
Therefore, you may have to (temporarily) set [Console]::OutputEncoding to match the actual character encoding used by a given external console program:
While many console programs respect the active console code page (in which case no workarounds are required), some do not, typically in order to provide full Unicode support. Note that you may not notice a problem until you programmatically process such a program's output (meaning: capturing in a variable, sending through the pipeline to another command, redirection to a file), because such a program may detect the case when its stdout is directly connected to the console and may then selectively use full Unicode support for display.
Notable CLIs that do not respect the active console code page:
Python exhibits nonstandard behavior in that it uses the active ANSI code page by default, i.e. the code page normally only used by non-Unicode GUI-subsystem applications.
However, you can use $env:PYTHONUTF8=1 before invoking Python scripts to instruct Python to use UTF-8 instead (which then applies to all Python calls made from the same process); in v3.7+, you can alternatively pass command-line option -X utf8 (case-sensitive) as a per-call opt-in.
Go and also Node.js invariably use UTF-8 encoding.
The following snippet shows how to set [Console]::OutputEncoding temporarily as needed:
# Save the original encoding.
$orig = [Console]::OutputEncoding
# Work with console programs that use UTF-8 encoding,
# such as Go and Node.js
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
# Piping to Write-Output is a dummy operation that forces
# decoding of the external program's output, so that encoding problems would show.
go run hello.go | Write-Output
# Work with console programs that use ANSI encoding, such as Python.
# As noted, the alternative is to configure Python to use UTF-8.
[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP))
python hello.py | Write-Output
# Restore the original encoding.
[Console]::OutputEncoding = $orig
Your own answer provides an effective alternative, but it comes with caveats:
Activating the Use Unicode UTF-8 for worldwide language support feature via Control Panel (or the equivalent registry settings) changes the code pages system-wide, which affects not only all console windows and console applications, but also legacy (non-Unicode) GUI-subsystem applications, given that both the OEM and the ANSI code pages are being set.
Notable side effects include:
Windows PowerShell's default behavior changes, because it uses the ANSI code page both to read source code and as the default encoding for the Get-Content and Set-Content cmdlets.
For instance, existing Windows PowerShell scripts that contain non-ASCII range characters such as é will then misbehave, unless they were saved as UTF-8 with a BOM (or as "Unicode", UTF-16LE, which always has a BOM).
By contrast, PowerShell (Core) v6+ consistently uses (BOM-less) UTF-8 to begin with.
Old console applications may break with 65001 (UTF-8) as the active OEM code page, as they may not be able to handle the variable-length encoding aspect of UTF-8 (a single character can be encoded by up to 4 bytes).
See this answer for more information.
[1] The cross-platform PowerShell (Core) v6+ edition uses (BOM-less) UTF-8 consistently. While it is possible to configure Unix terminals and thereby console (terminal) applications to use a character encoding other than UTF-8, doing so is rare these days - UTF-8 is almost universally used.
[2] By contrast, it is the $OutputEncoding preference variable that determines the encoding used for sending text to external programs, via the pipeline.
Solution is to enable Beta: Use Unicode UTF-8 for worldwide language support as described in What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?
Note: this solution may cause problems with legacy programs. Please read answer by mklement0 and answer by Quciksilver for details and alternative solutions.
Also I found explanation written by Ghisler helpful (source):
If you check this option, Windows will use codepage 65001 (Unicode
UTF-8) instead of the local codepage like 1252 (Western Latin1) for
all plain text files. The advantage is that text files created in e.g.
Russian locale can also be read in other locale like Western or
Central Europe. The downside is that ANSI-Only programs (most older
programs) will show garbage instead of accented characters.
Also Powershell before version 7.1 has a bug when this option is enabled. If you enable it , you may want to upgrade to version 7.1 or later.
I like this solution because it's enough to set it once and it's working. It brings consistent Unix-like UTF-8 behaviour to Windows. I hope I will not see any issues.
How to enable it:
Win+R → intl.cpl
Administrative tab
Click the Change system locale button
Enable Beta: Use Unicode UTF-8 for worldwide language support
Reboot
or alternatively via reg file:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"ACP"="65001"
"OEMCP"="65001"
"MACCP"="65001"
I am using wkhtmltopdf.exe command line tool to render HTML into PDF.
I try to generate PDF with custom header using cyrillic symbols. So I have cmd command like this:
wkhtmltopdf.exe --header-center "Заказ в Австралию — Test" http://localhost/MyPage c:/1.pdf
The issue is that in generated PDF the header looks like this: "?????? ? ??????? - Test"
I tried to add parameter --encoding utf-8 but this not worked for me
Note that my pc is running Windows with the only installed English (US) system locale.
Is there any thoughts how to solve this issue without installing any other system locale?
Old topic but hope my solution will help someone. Here is what helped me (used good clue from lit comment above) :
created .bat file with text like this:
chcp 65001
C:\data\wkhtmltopdf.exe --footer-font-size "10" --footer-center "Сторінка [page] з [topage]" file:///C:\data\12345.html C:\data\12345.pdf
for my server side solution I called ProcessStartInfo for FileName="cmd" and Arguments= "/c "+batchfile
This did the job.
BUT - you still need to make sure that server is switched to your language for non-Unicode chars.
I have a similar question to this:
ColdFusion, CFDirectory and the French
which was not given a satisfactory answer.
We have upgraded from Coldfusion 9 to Coldfusion 11. So far no major problems except the following:
When using CFdirectory to display file names that contain non ASCII characters in their names (eg: accents, umlauts) we get to see the file name with replacement characters � instead of the correct UTF equivalent. For example a file named L’État, c’est moi.pdf is displayed as L�����tat, c���est moi.pdf.
We are confident that this is a Coldfusion issue as nothing has changed but the Coldfusion version. With Coldfusion 9 CFdirectory worked OK when listing the same accented filenames. Our OS is Redhat 7.0 and the file names are also displayed correctly on the terminal with the ls command. I have also created a quick PHP script to see if PHP can read correctly the directory with the "readdir" command and there no problems there either, filenames are rendered correctly.
So I believe this has to be a Coldfusion 11 issue. I have added the -Dfile.encoding=UTF-8 -Dencoding=UTF-8 parameters in the JVM settings from the Coldfusion administrator server interface but it made no difference.
Any suggestions on how to rectify this would be appreciated.
example of code used follows:
<cfdirectory
action="list"
directory="#ExpandPath( './' )#/pdfs"
listinfo="name"
name="qFile"
/>
<cfdump
var="#qFile#"
label="All Files"
/>
Have you tried setting the cfprocessingdirective tag?
<cfprocessingdirective pageencoding="utf-8">
CF 11 WikiDocs
Also, In the Chrome Network Inspector, make sure the encoding is being returned correctly. Eg:
Content-Type:text/html; charset=UTF-8
If your environment is Linux, you need to have a clean UTF-8 configuration.
Please have a look here.
I had the same problem, I just add into the file ~/.bashrc these lines:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
After that, don't forget to restart your Coldfusion Server
sudo /opt/coldfusion11/cfusion/bin/coldfusion restart
Please see: Why are certain characters not being injected correctly to SQL Server from a CFQUERY?
Make sure your file is saved with encoding Unicode UTF-8.
Also make sure your JVM arguments will process that as well. Admin > Server Settings > Java and JVM. Add " -Dfile.encoding=UTF-8" to the Arguments.
I had the same problem this solved my bug
/.bashrc
LC_ALL="de_DE.UTF-8"
on linux and after change restart coldfusion application
Is there a way to configure the password strength for the add-user.sh script in Jboss 6 EAP?
There is no such a way to define the rules but it allows alphanumerics and special characters...
This was not possible, so somebody made a feature request for it in the community Wildfly project:
https://issues.jboss.org/browse/WFLY-1611
Wildfly 8 should have it, but it's not known when (if ever) this gets merged back to EAP 6.
Alternatively, you can bypass add-user and create the user by hand with any password for standalone mode. Of course, it's not because you can that you should always do so, password rules are there for security. Only do this for development, and surely don't copy your install or configuration folder to production!
Go to <jboss-install>/standalone/configuration and open mgmt-users.properties.
Find the line with #$REALM_NAME= and note the value after the =, last $ excluded. In my case it is ManagementRealm.
If you have Linux, run the following (substitute your user, the value noted above and password):
echo -n '<user>:<realm>:<password>' | md5sum
If you have Windows you will have to find your own utility to generate MD5 hashes. If you write the user/realm/password combo to file to let an MD5 tool do its work, check that there is no newline character at the end of your file or the hash will be off. You also won't need the quotes in that case.
md5sum will output a hexidecimal value. Go back to mgmt-users.properties and add a line at the end:
<user>=<MD5 hash>
I am new to linux and antiword.
I want to convert word to PostScript.
I am able to create the file but the PS file is not openning.
And at the time of conversion it is giving error like
"The combination PostScript and UTF-8 is not supported"
This utility 'antiword', the latest version of which I found to be 0.37 (21 Oct. 2005) ships with a few Readme files.
It explains itself in its own History file as such:
The name comes from: "The antidote against people who send Microsoft(R) Word files to everybody, because they believe that everybody runs Windows(R) and therefore runs Word".
The same file says about its current version:
Beta release, for evaluation by the public.
And it also says, under the header Known Limitations:
[....]
2) Antiword doesn't show all the images included in a Word document.
[....]
7) PostScript ouput will not work in combination with UTF-8. It only works in combination with character sets ISO-8859-1, ISO-8859-2 and ISO-8859-5.
8) Antiword's error messages are not very helpful.
So here you are. The behavior you are experiencing is exactly as is documented. :-)
#Tushar: It's also recommended to read the included FAQ document.