Python 3 doesn't read unicode file on a new server - unicode

My webpages are served by a script that dynamically imports a bunch of files with
try:
with open (filename, 'r') as f:
exec(f.read())
except IOError: pass
(actually, can you suggest a better method of importing a file? I'm sure there is one.)
Sometimes the files have strings in different languages, like
# contents of language.ru
title = "Название"
Those were all saved as UTF-8 files. Python has no problem running the script in command line or serving a page from my MacBook:
OK: [server command line] python3.0 page.py /index.ru
OK: http://whitebox.local/index.ru
but it throws an error when trying to serve a page from a server we just moved to:
157 try:
158 with open (filename, 'r') as f:
159 exec(f.read())
160 except IOError: pass
161
/usr/local/lib/python3.0/io.py in read(self=, n=-1)
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 627: ordinal not in range(128)
All the files were copied from my laptop where they were perfectly served by Apache. What is the reason?
Update: I found out the default encoding for open() is platform-dependent so it was utf8 on my laptop and ascii on server. I wonder if there is a per-program function to set it in Python 3 (sys.setdefaultencoding is used in site module and then deleted from the namespace).

Use open(filename, 'r', encoding='utf8').
See Python 3 docs for open.

Use codecs library, I'm using python 2.6.6 and I do not use the usual open with encoding argument:
import codecs
codecs.open('filename','r',encoding='UTF-8')

You can use something like
with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
data = f.read()
# make changes to the string 'data'
with open(fname + '.new', 'w',
encoding="ascii", errors="surrogateescape") as f:
f.write(data)
more information is on python unicode documents

Related

How to specify encoding in Alembic migrations

I am working to add Alembic migration to a legacy project. I am hoping to execute some raw sql, including some insert statements with unicode that I want to be encoded with UTF-8, but am getting a UnicodeDecodeError. To reproduce this error, I created this example:
def upgrade():
op.execute("SELECT '𝔥𝔢𝔩𝔩𝔬'")
When I run this migration, I get:
File "virtual_env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1263, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "virtual_env/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 324, in _execute_on_connection
self, multiparams, params, execution_options
File "virtual_env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1462, in _execute_clauseelement
cache_hit=cache_hit,
File "virtual_env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1678, in _execute_context
e, util.text_type(statement), parameters, None, None
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 8: ordinal not in range(128)
I confirmed that my Postgres database encoding is set to UTF-8. I also tried to work out whether execution_options parameters exist to set the encoding, but I was not able to discern this. I also tried abandoning op.execute() and tried creating my own engine instance using sa.create_engine(url, encoding='utf-8'), which mysteriously still gave me an encoding error and was wanting to use ascii.
Which Alembic, SQLAlchemy or Psycopg2 subsystem is expecting ascii, and is there a way to change that expectation either in Alembic's configuration or for a specific migration?
I'm happy to dig into the internal APIs and use a hacky solution, but would hope there is a straightforward way to do this that I have just not encountered in a couple hours searching the documentation.
Note: this project uses Python 2.7.16, and despite my efforts I do not have authorization to port it to Python3 yet.

Cannot access FTP directory with CP1250/CP852/UTF-8 encoding

I am trying to read in some files from the following directory structure:
/jc/06 Önéletrajzok/Profession/Előszűrés sablonok név szerint
But for some strange reason I cannot enter not even in the upper level directories.
I already tried with PHP/Python3.6/Ruby but without much luck. At least with PHP and Python I can CWD() at least until the /jc/06 Önéletrajzok/Profession part.
Here is my python code for reference:
from ftplib import FTP
ftp = FTP('hostname')
ftp.login('username','pwd')
ftp.cwd('jc') # Just for demonstration purposes as step by step
ftp.cwd('06 Önéletrajzok')
ftp.cwd('Profession')
print(ftp.nlst()[2]) # Which gives: 'ElÅ\x91szűrés sablonok név szerint
# But when I am trying:
ftp.cwd('ElÅ\x91szűrés sablonok név szerint')
# Or either:
ftp.cwd('Előszűrés sablonok név szerint')
# It gives:
# UnicodeEncodeError: 'latin-1' codec can't encode character '\u0151' in position 6: ordinal not in range(256)
# So I am trying encoding CP1250 or CP852 (for Hungarian)
dir = 'Előszűrés sablonok név szerint'.encode('cp852') # which gives: b'El\x8bsz\xfbr\x82s sablonok n\x82v szerint'
ftp.cwd(dir.decode('utf-8'))
# and it gives the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 2: invalid start byte
So I am starting to give up on this one, I don't know how to access those files. The directory structure was created with Windows laptops accessing a Synology File server.
I have already tried with ftp.encoding = "utf-8" too.
Any ideas?

How to set the default encoding in a buildout script, or during virtualenv creation?

I have a Plone project which is created by a buildout script and needs a default encoding of utf-8. This is usually done in the sitecustomize.py file of the Python installation. Since there is a virtualenv, I'd like to generate this file automatically, to contain something like:
import sys
sys.setdefaultencoding('utf-8')
After generation I have two empty sitecustomize.py files - one in parts/instance/, and one in parts/buildout; but none of these seems to be used (I couldn't find them in sys.path).
I tried zopepy:
>>> from os.path import join, isfile
>>> from pprint import pprint
>>> import sys
>>> pprint([p for p in sys.path
... if isfile(join(p, 'sitecustomize.py'))
... ])
and found another one in my local lib/python2.7/site-packages/ directory which looks good; but it doesn't work:
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
This directory sits near the end of the sys.path, because I needed to add it by an extra-paths entry (to get another historical package).
Any pointers? Thank you!
System information: CentOS 7, Python 2.7.5
Edit:
I deleted them two empty sitecustomize.py files; now I have a default encoding of utf-8 in the zopepy session but still ascii in Plone; this surprises me, because I have in my buildout script:
[zopepy]
recipe=zc.recipe.egg
eggs = ${instance:eggs}
extra-paths = ${instance:extra-paths}
interpreter = zopepy
scripts = zopepy
To debug this, I created a little function which I added to my code, and which displays a little information about relevant modules in the sys.path:
import sys
from os.path import join, isdir, isfile
def sitecustomize_info():
plen = len(sys.path)
print '-' * 79
print 'sys.path has %(plen)d entries' % locals()
for tup in zip(range(1, plen+1), sys.path):
nr, dname = tup
if isdir(dname):
for fname in ('site.py', 'sitecustomize.py'):
if isfile(join(dname, fname)):
print '%(nr)4d. %(dname)s/%(fname)s' % locals()
spname = join(dname, 'site-packages', 'sitecustomize.py')
if isfile(spname):
print '%(nr)4d. %(spname)s' % locals()
else:
print '? %(dname)s is not a directory' % locals()
print '-' * 79
Output:
sys.path has 303 entries
8. /usr/lib64/python2.7/site-packages/sitecustomize.py
295. /opt/zope/instances/wnzkb/lib/python2.7/site-packages/sitecustomize.py
? /usr/lib64/python27.zip is not a directory
298. /usr/lib64/python2.7/site.py
? /usr/lib64/python2.7/lib-tk is not a directory
? /usr/lib64/python2.7/lib-old is not a directory
303. /usr/lib/python2.7/site-packages/sitecustomize.py
All sitecustomize.py files look the same (switching to utf-8), and I didn't tweak site.py (for now; if everything else fails, I might need to.)
If you really want/need to use the sitecustomize.py trick, you could include this part in your buildout:
[fixencode]
recipe = plone.recipe.command
stop-on-error = yes
update-command = ${fixencode:command}
command =
SITE_PACKAGES=$(${buildout:executable} -c \
'from distutils.sysconfig import get_python_lib;print(get_python_lib())')
cat > $SITE_PACKAGES/../sitecustomize.py << EOF
#!${buildout:executable} -S
import sys
sys.setdefaultencoding('utf-8')
EOF
It will be added into the site-packages folder from your virtualenv.
It looks like sitecustomize.py is not found anymore unless placed in the global lib directories (Discussion "deleting setdefaultencoding in site.py is evil" (2009), Tracker ticket "sitecustomize.py not found") - and this was made on purpose (!).
This is to prevent users from overriding the default encoding which might have been adjusted by some library. Libraries shouldn't do that, however.
Thus, whoever needs to set the default encoding is seduced to do it globally, which looks like a very silly idea to me. I'd consider it much more reasonable to set this in my virtual environment.
Unfortunately the sitecustomize.py modules in a virtualenv seem to be silently ignored; but it is possible to edit the local site.py. Here is a little sed script:
# vv-------1------vv vv---2---vv vv--------3------vv vv-----------4------------vv
sed --in-place -e 's,^\( encoding =\) \("ascii"\) \(# Default value\) \(set by _PyUnicode_Init()\),\1 "utf-8" \3 \2 \4,' lib/python2.7/site.py

Python notebook fail to load properly

I am using the Anaconda 2.76 version. It was working fine until today. The notebook page was not loaded properly. Noe of the feature was responsive. After I did some research, I think it is some coding error, but since I am really not a computing kind of guy, I don't know where exactly went wrong and how to fix it. Below is the error message I received. please lend me a hand. Thanks a lot.
HTTPRequest (protocol=’http’, host =’127.0.0.1:8888;,method=’GET’,uri=’/static/base/images/favicon.ico’,version=’HTTP/1.1’,remote_ip-‘127.0.0.1’,headers={‘connection’:’keep-alive’,’Accept-Language’:’zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4’,’Accept-Encoding’:’gzip,deflate,sdch’,'host’:'127.0.0.1:8888’,Accept':'*/*','User-Agent':'Mozilla/5.0(windows NT 6.1; WOW64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/34.0.1847.131 Safari/537.36'})
Traceback (most recent call last):
File "D:|Anaconda\lib\site-packages\tornado\web.py", line 1218, in _when_complete
callback()
File "D:|Anaconda\lib\site-packages\tornado\web.py", Line 1239, in _execute_method
self._when_coplete(method(*self.path_args,**self.path_kwargs),
File "D:|Anaconda\lib\site-packages\IPython\html\base\handlers.py", line 318, in get
mime_type, encoding=mimetypes.guess_type(abspath)
File "D:\Anaconda\lib\mimetypes.py", line 297, in guess_type
init()
File "D:\Anaconda\lib\mimetypes.py", line 358,in init
db.read_windows_registry()
File "D:\Anaconda\lib\mimetypes.py", line 258,in read_windows_registry
for subkeyname in enum_types(hkcr):
File "D:\Anaconda\lib\mimetypes.py", line 249,in enum_types
ctype=ctype.encode(default_encoding)#omit in 3.X!
UnicodeDecodeError: "ascii" codec can't decode byte 0*b0 in position 1:ordinal not in range(128)
2014-5-12 16:43:45.456 [tornado.access] ERROR |500 GET /static/base/images/favicon.ico (127.0.0.1) 97.00ms`
This is a known issue.
I've solved the same problem using the following temporary modification of Anaconda/Lib/mimetypes.py, lines 252-253 (as proposed here).
try:
ctype = ctype.encode(default_encoding) # omit in 3.x!
except UnicodeEncodeError:
pass
except Exception: #<--
pass #<--
else:
yield ctype

PUT binary data using requests lib

I need to create a small WebDAV client that just upload files on the server.
I've found "requests" library that seems to be very easy to be used but I'm not able to use it properly.
The client should transfer binary files - so I've used the example bellow:
>>> url = 'http://IPADDR/webdav'
>>> files = {'report.xls': open('report.xls', 'rb')}
>>> r = requests.post(url, files=files)
from http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file.
For me it's not working, I have the following error:
File ".../site-packages/requests/packages/urllib3/connectionpool.py", line 260, in _make_request
conn.request(method, url, **httplib_request_kw)
File ".../httplib.py", line 941, in request
self._send_request(method, url, body, headers)
File ".../httplib.py", line 975, in _send_request
self.endheaders(body)
File ".../httplib.py", line 937, in endheaders
self._send_output(message_body)
File ".../httplib.py", line 795, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 147: ordinal not in range(128)
Should be the input file somehow encoded? (I didn't found anything related in the "requests" documentation).
After some debugging, I've actually found what's happening.
I was able to fix the issue by removing the following import in my script:
from __future__ import unicode_literals
This import seems to cause unwanted string conversions in urllib3 (which requests relies on).
As requests' author explained, this issue should be filed against urllib3.