How to specify encoding in Alembic migrations - encoding

I am working to add Alembic migration to a legacy project. I am hoping to execute some raw sql, including some insert statements with unicode that I want to be encoded with UTF-8, but am getting a UnicodeDecodeError. To reproduce this error, I created this example:
def upgrade():
op.execute("SELECT 'š”„š”¢š”©š”©š”¬'")
When I run this migration, I get:
File "virtual_env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1263, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "virtual_env/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 324, in _execute_on_connection
self, multiparams, params, execution_options
File "virtual_env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1462, in _execute_clauseelement
cache_hit=cache_hit,
File "virtual_env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1678, in _execute_context
e, util.text_type(statement), parameters, None, None
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 8: ordinal not in range(128)
I confirmed that my Postgres database encoding is set to UTF-8. I also tried to work out whether execution_options parameters exist to set the encoding, but I was not able to discern this. I also tried abandoning op.execute() and tried creating my own engine instance using sa.create_engine(url, encoding='utf-8'), which mysteriously still gave me an encoding error and was wanting to use ascii.
Which Alembic, SQLAlchemy or Psycopg2 subsystem is expecting ascii, and is there a way to change that expectation either in Alembic's configuration or for a specific migration?
I'm happy to dig into the internal APIs and use a hacky solution, but would hope there is a straightforward way to do this that I have just not encountered in a couple hours searching the documentation.
Note: this project uses Python 2.7.16, and despite my efforts I do not have authorization to port it to Python3 yet.

Related

How do I fix an encoding error in order to upload a 5GB text file using psql? (ERROR: invalid byte sequence for encoding "UTF8": 0x92)

I am trying to upload a series of tables (.txt files) into a PostgreSQL database that runs on my Windows 10 desktop. I use psql upload the files. I have successfully uploaded a couple of tables but the largest one (5GB with over 20 million rows) is giving me trouble:
databasename=# \copy table1 FROM 'C:\Users\tablename.txt' DELIMITER ',' CSV HEADER;
ERROR: character with byte sequence 0x9d in encoding "WIN1252" has no equivalent in encoding "UTF8"
CONTEXT: COPY table1, line 581330
I found an answer here which suggested I check the client encoding...
databasename=# SHOW client_encoding;
client_encoding
-----------------
WIN1252
(1 row)
and then change it, which I tried:
databasename=# SET CLIENT_ENCODING TO 'utf8';
SET
I then try the same copy command again and get the following error:
ERROR: invalid byte sequence for encoding "UTF8": 0x92
CONTEXT: COPY table1, line 206051
I've read a little about 0x92 here. It sounds like there is a character in the file which cannot be encoded when I try and perform the \copy command.
Some background:
I was able to upload about 1 million rows into SQL Server 2019 (free version) using the SQL Server Import and Export Wizard. (I stopped the import because it was taking too long.) I was also able to view the file in R using read.csv. Not sure if any of this is helpful. Thank you all in advance.

Does the postgres COPY function support utf 16 encoded files?

I am trying to use the postgreSQL COPY function to insert a UTF 16 encoded csv into a table. However, when running the below query:
COPY temp
FROM 'C:\folder\some_file.csv'
WITH (
DELIMITER E'\t',
FORMAT csv,
HEADER);
I get the error below:
ERROR: invalid byte sequence for encoding "UTF8": 0xff
CONTEXT: COPY temp, line 1
SQL state: 22021
and when I run the same query, but adding the encoding settings Encoding 'UTF-16' or Encoding 'UTF 16' to the with block, I get the error below:
ERROR: argument to option "encoding" must be a valid encoding name
LINE 13: ENCODING 'UTF 16' );
^
SQL state: 22023
Character: 377
I've looked through the postgres documentation to try to find the correct encoding, but haven't managed to find anything. Is this because the copy function does support UTF 16 encoded files? I would have thought that this would almost certainly have been possible!
I'm running postgres 12, on windows 10 pro
Any help would be hugely appreciated!
No, you cannot do that.
UTF-16 is not in the list of supported encodings.
PostgreSQL will never support an encoding that is not an extension of ASCII.
You will have to convert the file to UTF-8.

gcloud crashed (UnicodeEncodeError): 'ascii' codec can't encode character u'\xe7' in position 13: ordinal not in range(128)

Welcome to the Google Cloud SDK! Run "gcloud -h" to get the list of
available commands.
C:\Program Files (x86)\Google\Cloud SDK>gcloud init Welcome! This
command will take you through the configuration of gcloud.
Your current configuration has been set to: [default]
You can skip diagnostics next time by using the following flag:
gcloud init --skip-diagnostics
Network diagnostic detects and fixes local network connection issues.
Checking network connection...done. Reachability Check passed. Network
diagnostic (1/1 checks) passed.
ERROR: gcloud crashed (UnicodeEncodeError): 'ascii' codec can't encode
character u'\xe7' in position 13: ordinal not in range(128)
If you would like to report this issue, please run the following
command: gcloud feedback
To check gcloud for common problems, please run the following command:
gcloud info --run-diagnostics
I don't know what to do... I just want to use the sdk but I can't init it...
Have a look at the file which gave error , edit the .py file to add following lines
import sys
reload(sys)
sys.setdefaultencoding('utf8')
Try edit the code in google-cloud-sdk/lib/third_party/socks/__init__.py(google-cloud-sdk is the archive you downloaded to install google cloud sdk) in line 262:
req = req + struct.pack(">H", destport)
to
if isinstance(req, unicode):
req = req.encode('UTF-8')
req = req + struct.pack(">H", destport)
reference: https://c11e.wodemo.com/gcloud-crashed-unicodedecodeerror
For me the fix was removing an accent ('Ć©') from a folder in the path of my project. Hope it can help someone since I didn't find this solution after googling it for hours.
I kept getting this similar error every time I ran a gcloud command after a crash:
ERROR: gcloud crashed (UnicodeDecodeError): 'utf8' codec can't decode byte 0xa4 in position 1: invalid start byte
The solution was to delete this file:
~/.config/gcloud/gce
Don't ask me why that works or what that files does, I don't know (if you do, please let me know), but it gets recreated on the next command run and it fixed my issue.
In my case, it was a special character in the folders of the current directory. After changing the current directory, it worked!
As youā€™ve noticed that the error is due to a non-ASCII character in the username. As a workaround, you can set the CLOUDSDK_CONFIG environment variable to a path that contains only ASCII characters.

Python notebook fail to load properly

I am using the Anaconda 2.76 version. It was working fine until today. The notebook page was not loaded properly. Noe of the feature was responsive. After I did some research, I think it is some coding error, but since I am really not a computing kind of guy, I don't know where exactly went wrong and how to fix it. Below is the error message I received. please lend me a hand. Thanks a lot.
HTTPRequest (protocol=ā€™httpā€™, host =ā€™127.0.0.1:8888;,method=ā€™GETā€™,uri=ā€™/static/base/images/favicon.icoā€™,version=ā€™HTTP/1.1ā€™,remote_ip-ā€˜127.0.0.1ā€™,headers={ā€˜connectionā€™:ā€™keep-aliveā€™,ā€™Accept-Languageā€™:ā€™zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4ā€™,ā€™Accept-Encodingā€™:ā€™gzip,deflate,sdchā€™,'hostā€™:'127.0.0.1:8888ā€™,Accept':'*/*','User-Agent':'Mozilla/5.0(windows NT 6.1; WOW64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/34.0.1847.131 Safari/537.36'})
Traceback (most recent call last):
File "D:|Anaconda\lib\site-packages\tornado\web.py", line 1218, in _when_complete
callback()
File "D:|Anaconda\lib\site-packages\tornado\web.py", Line 1239, in _execute_method
self._when_coplete(method(*self.path_args,**self.path_kwargs),
File "D:|Anaconda\lib\site-packages\IPython\html\base\handlers.py", line 318, in get
mime_type, encoding=mimetypes.guess_type(abspath)
File "D:\Anaconda\lib\mimetypes.py", line 297, in guess_type
init()
File "D:\Anaconda\lib\mimetypes.py", line 358,in init
db.read_windows_registry()
File "D:\Anaconda\lib\mimetypes.py", line 258,in read_windows_registry
for subkeyname in enum_types(hkcr):
File "D:\Anaconda\lib\mimetypes.py", line 249,in enum_types
ctype=ctype.encode(default_encoding)#omit in 3.X!
UnicodeDecodeError: "ascii" codec can't decode byte 0*b0 in position 1:ordinal not in range(128)
2014-5-12 16:43:45.456 [tornado.access] ERROR |500 GET /static/base/images/favicon.ico (127.0.0.1) 97.00ms`
This is a known issue.
I've solved the same problem using the following temporary modification of Anaconda/Lib/mimetypes.py, lines 252-253 (as proposed here).
try:
ctype = ctype.encode(default_encoding) # omit in 3.x!
except UnicodeEncodeError:
pass
except Exception: #<--
pass #<--
else:
yield ctype

Python 3 doesn't read unicode file on a new server

My webpages are served by a script that dynamically imports a bunch of files with
try:
with open (filename, 'r') as f:
exec(f.read())
except IOError: pass
(actually, can you suggest a better method of importing a file? I'm sure there is one.)
Sometimes the files have strings in different languages, like
# contents of language.ru
title = "ŠŠ°Š·Š²Š°Š½ŠøŠµ"
Those were all saved as UTF-8 files. Python has no problem running the script in command line or serving a page from my MacBook:
OK: [server command line] python3.0 page.py /index.ru
OK: http://whitebox.local/index.ru
but it throws an error when trying to serve a page from a server we just moved to:
157 try:
158 with open (filename, 'r') as f:
159 exec(f.read())
160 except IOError: pass
161
/usr/local/lib/python3.0/io.py in read(self=, n=-1)
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 627: ordinal not in range(128)
All the files were copied from my laptop where they were perfectly served by Apache. What is the reason?
Update: I found out the default encoding for open() is platform-dependent so it was utf8 on my laptop and ascii on server. I wonder if there is a per-program function to set it in Python 3 (sys.setdefaultencoding is used in site module and then deleted from the namespace).
Use open(filename, 'r', encoding='utf8').
See Python 3 docs for open.
Use codecs library, I'm using python 2.6.6 and I do not use the usual open with encoding argument:
import codecs
codecs.open('filename','r',encoding='UTF-8')
You can use something like
with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
data = f.read()
# make changes to the string 'data'
with open(fname + '.new', 'w',
encoding="ascii", errors="surrogateescape") as f:
f.write(data)
more information is on python unicode documents