cx_freeze windows console encoding bug - encoding

I have python 3.4 script which has deal with unicode characters, diacritics and etc.
The script works perfectly on Mac and Windows.
If I freeze it to windows executable (freezing on windows!)
python cxfreeze verifier.py -cOO --target-dir verifier
and will try to run it, it will give me the following exception while performing output
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27, in <module>
exec(code, m.__dict__)
File "C:\Users\me\Desktop\verifier\verifier.py", line 520, in <module>
main()
File "C:\Users\me\Desktop\verifier\verifier.py", line 484, in main
ConsoleManager.dynamic_print(MSG_VERIFYING_FILE.format(relativePath))
File "C:\Users\me\Desktop\verifier\verifier.py", line 230, in dynamic_print
ConsoleManager.print(message, end='\r')
File "C:\Users\me\Desktop\verifier\verifier.py", line 226, in print
print(message, end=end)
File "X:\Python34-x32\lib\encodings\cp866.py", line 19, in encode
UnicodeEncodeError: 'charmap' codec can't encode character '\u0456' in position 16: character maps to <undefined>
I wonder why 'cp866'? The script is working with utf-8 exclusively and there were no cp866 charset references at all!
Looks like cx_Freeze is trying to print utf-8 stream to console like cp866 stream.
How can i tell to cx_Freeze exe-creator script to perform all console output in utf-8?
I will be glad to any help.
UPDATE: found http://sourceforge.net/p/cx-freeze/mailman/message/24126644/ maybe it is about encountered problem

Try to execute the following in your windows console:
chcp 65001
set PYTHONIOENCODING=utf-8

Related

Getting following error on generating language scorer on Deepspeech

File "generate_scorer_package", line 1
SyntaxError: Non-UTF-8 code starting with '\xea' in file generate_scorer_package on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Before answering this question, I am going to make some assumptions:
Firstly, I believe you are following the DeepSpeech Playbook and are at the step in generating a kenlm.scorer file, as documented here
Secondly, I am going to assume that you are using a Python editor of some descrition, like PyCharm.
The error SyntaxError: Non-UTF-8 code starting with '\xea' in file generate_scorer_package on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details is not related to DeepSpeech; it is related the Python encoding of the file that is being executed.
Python 3 assumes that the encoding of the .py file is UTF-8; however some editors - particularly editors in other locales - can override this setting.
To force the file to UTF-8 encoding, add the following code to the top of the generate_scorer_package.py file:
# coding: utf8
NOTE: It MUST be at the top of the file
Alternatively, identify where in your editor the encoding is set, and change it.
See also these Stack Overflow questions that are similar:
SyntaxError: Non-UTF-8 code starting with '\x92' in file D:\AIAssistant\build\gui.py on line 92, but no encoding declared;
SyntaxError: Non-UTF-8 code starting with '\x82'

Writing accented characters from user input to a text file Python 3.7

Hello I have the following code snippet:
while True:
try:
entry = input("Input element: ")
print (entry)
with open(fileName,'a',encoding='UTF-8') as thisFile:
thisFile.write(entry)
except KeyboardInterrupt:
break
This one basically continuously gets input and writes it to a file until manually interrupted. However, when the user inputs something like a Ñ. It outputs: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed I explicity put the utf-8 encoding and even tried latin-1 but still the same error. I have also put the # -*- coding: utf-8 -*- on top of my code and tried thisFile.write(entry.encode('utf-8') but it still gives me the error.
Setting the following environment variables fixed it for me.
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
or another method is running it via:
PYTHONIOENCODING="UTF-8" python3 writetest.py

Change Line Feeds from CRLF to LF in Eclipse

I recently noticed that the line-feeds of files in my project are CRLF but I want them as LF.(I get the following message from GIT GUI :
"UTF-8 Unicode text, with CRLF line terminators"
How can I solve this problem?
Try this :
Window->Preferences->General->Workspace: New Text file line delimiter
Just try below process
File--> Convert Line Delimiters to --> Unix

Forcing UTF-8 over cp1252 (Python3)

I've written some code that makes use of the Biopython Entrez wrapper. Code was working fine on my previous Win10 laptop (Python 3.5.1), but I've just ported the code to a new Win10 laptop with the same versions of every package and Python installed and I'm now getting a decode error.
The traceback error leads to a function that fetches text - it's attempting to decode the text using cp1252 when it should be using UTF-8. I know that similar questions have been asked, but none have dealt with this problem happening inside a package (Biopython in my case). Copying the UTF-8 encoding file in Python/lib and renaming it to cp1252.py solves the problem, but this obviously is not a long term solution.
File "C:\Users\arjun\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 21715: character maps to <undefined>
Use the io module for reading if you're using Python 3.x (https://docs.python.org/2/library/io.html#io.open).
By default, it will use the encoding specified on its running platform. You can also specify your own encoding as explained in the docs.

python 3.0, how to make print() output unicode?

I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.
print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)
Output is (changing angle brackets to square brackets for readability):
sys.stdout encoding is "cp1252"
Traceback (most recent call last):
File "TestPrintEncoding.py", line 22, in [module]
print(str1)
File "C:\Python30\lib\io.py", line 1491, in write
b = encoder.encode(s)
File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0101'
in position 4: character maps to [undefined]
Note that ü = \xfc = 252 gives no problem since it's upper ASCII. But ā = \u0101 is beyond 8-bits.
Anyone have an idea how to change the encoding of sys.stdout to 'utf-8'? Bear in mind that Python 3.0 no longer uses the codecs module, if I understand the documentation right.
Apologies, I gave you the program without the preamble. Before the 3 lines given, it starts like this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
Unfortunately, the coding specified by the "coding:" line is the coding of the source code, not of the console output. But thank you for your thoughts!
The Windows command prompt (cmd.exe) cannot display the Unicode characters you are using, even though Python is handling it in a correct manner internally. You need to use IDLE, Cygwin, or another program that can display Unicode correctly.
See this thread for a full explanation:
http://www.nabble.com/unable-to-print-Unicode-characters-in-Python-3-td21670662.html
You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8." I have written a page on my ordeal with this problem.
Check out the question and answer here, I think they have some valuable clues. Specifically, note the setdefaultencoding in the sys module, but also the fact that you probably shouldn't use it.
Here's a dirty hack:
# works
import os
os.system("chcp 65001 &")
print("юникод")
However everything breaks it:
simple muting first line already breaks it:
# doesn't work
import os
os.system("chcp 65001 >nul &")
print("юникод")
checking for OS type breaks it:
# doesn't work
import os
if os.name == "nt":
os.system("chcp 65001 &")
print("юникод")
it doesn't even works under if block:
# doesn't work
import os
if os.name == "nt":
os.system("chcp 65001 &")
print("юникод")
But one can print with cmd's echo:
# works
import os
os.system("chcp 65001 & echo {0}".format("юникод"))
and here's a simple way to make this cross-platform:
# works
import os
def simple_cross_platrofm_print(obj):
if os.name == "nt":
os.system("chcp 65001 >nul & echo {0}".format(obj))
else:
print(obj)
simple_cross_platrofm_print("юникод")
but the window's echo trailing empty line can't be suppressed.
The problem of displaying Unicode charaters in Python in Windows is known. There is no official solution yet. The right thing to do is to use winapi function WriteConsoleW. It is nontrivial to build a working solution as there are other related issues. However, I have developed a package which tries to fix Python regarding this issue. See https://github.com/Drekin/win-unicode-console. You can also read there a deeper explanation of the problem. The package is also on pypi (https://pypi.python.org/pypi/win_unicode_console) and can be installed using pip.