Related
I want to read use gfile.FastGFile(image_path, 'rb').read() to read a picture and use it as the input of my project, and I use the directory name as the lable of these pictures which are include in the directory, when the directory name is in English, my code works fine, but when the directory name is in Chinese, it throws this Error:
Traceback (most recent call last):
File "F:/pythonWS/imageFilter/jpegFileJudge.py", line 27, in <module>
image_data = gfile.FastGFile(image_path, 'rb').read()
File "C:\Program Files\Python35\lib\site-
packages\tensorflow\python\lib\io\file_io.py", line 106, in read
self._preread_check()
File "C:\Program Files\Python35\lib\site-
packages\tensorflow\python\lib\io\file_io.py", line 73, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "C:\Program Files\Python35\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Program Files\Python35\lib\site-
packages\tensorflow\python\framework\errors_impl.py", line 466, in
raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile
failed to Create/Open: F:\vsWorkspace\pics\test\三宝鸟
\0ff41bd5ad6eddc403fa02d13bdbb6fd526633fe.jpg :
ϵͳ\udcd5Ҳ\udcbb\udcb5\udcbdָ\udcb6\udca8\udcb5\udcc4\udcceļ\udcfe\udca1\udca3
my test code is :
# -*- coding: utf-8 -*-
import glob
import os
import random
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile
image_folder='F:/vsWorkspace/pics/test'
os.chdir(image_folder)
count=0
for each in os.listdir(image_folder):
each=os.path.abspath(each)
os.chdir(each)
for image_path in os.listdir(each):
image_path = os.path.abspath(image_path)
print(image_path)
image_data = gfile.FastGFile(image_path, 'rb').read()
count += 1
os.chdir(image_folder)
My envirorment is Windows7 x64, python 3.5.3 and TensorFlow 1.0, How can I solve this problem?
By the way,I have to use Chinese directories' name use my pictures lables.
I have a serious problem with my populate. Characters are not stored correctly. My code:
def _create_Historial(self):
datos = [self.DB_HOST, self.DB_USER, self.DB_PASS, self.DB_NAME]
conn = MySQLdb.connect(*datos)
cursor = conn.cursor()
cont = 0
with open('principal/management/commands/Historial_fichajes_jugadores.csv', 'rv') as csvfile:
historialReader = csv.reader(csvfile, delimiter=',')
for row in historialReader:
if cont == 0:
cont += 1
else:
#unicodedata.normalize('NFKD', unicode(row[4], 'latin1')).encode('ASCII', 'ignore'),
cursor.execute('''INSERT INTO principal_historial(jugador_id, temporada, fecha, ultimoClub, nuevoClub, valor, coste) VALUES (%s,%s,%s,%s,%s,%s,%s)''',
(round(float(row[1]))+1,row[2], self.stringToDate(row[3]), unicode(row[4],'utf-8'), row[5], self.convertValue(row[6]), str(row[7])))
conn.commit()
cursor.close()
conn.close()
El error es el siguiente:
Traceback (most recent call last):
File "/home/tfg/pycharm-2016.3.2/helpers/pycharm/django_manage.py", line 41, in <module>
run_module(manage_file, None, '__main__', True)
File "/usr/lib/python2.7/runpy.py", line 188, in run_module
fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 82, in _run_module_code
mod_name, mod_fname, mod_loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/tfg/TrabajoFinGrado/demoTFG/manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python2.7/dist- packages/django/core/management/__init__.py", line 443, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python2.7/dist -packages/django/core/management/__init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 196, in run_from_argv
self.execute(*args, **options.__dict__)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 232, in execute
output = self.handle(*args, **options)
File "/home/tfg/TrabajoFinGrado/demoTFG/principal/management/commands/populate_db.py", line 230, in handle
self._create_Historial()
File "/home/tfg/TrabajoFinGrado/demoTFG/principal/management/commands/populate_db.py", line 217, in _create_Historial
(round(float(row[1]))+1,row[2], self.stringToDate(row[3]), unicode(row[4],'utf-8'), row[5], self.convertValue(row[6]), str(row[7])))
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 187, in execute
query = query % tuple([db.literal(item) for item in args])
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/connections.py", line 278, in literal
return self.escape(o, self.encoders)
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/connections.py", line 208, in unicode_literal
return db.literal(u.encode(unicode_literal.charset))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 6-7: ordinal not in range(256)
The characters was shownn as follows: Nicolás Otamendi, Gaël Clichy ....
When I print the characteros on shell of the python, its wah shown correctly.
Sorry for my english :(
Ok, I'll keep this brief.
You should convert encoded data/strs to Unicodes early in your code. Don't inline .decode()/.encode()/unicode()
When you open a file in Python 2.7, it's opened in binary mode. You should use io.open(filename, encoding='utf-8'), which will read it as text and decode it from utf-8 to Unicodes.
The Python 2.7 CSV module is not Unicode compatible. You should install https://github.com/ryanhiebert/backports.csv
You need to tell the MySQL driver that you're going to pass Unicodes and use UTF-8 for the connection. This is done by adding the following to your connection string:
charset='utf8',
use_unicode=True
Pass Unicode strings to MySQL. Use the u'' prefix to avoid troublesome implied conversion.
All your CSV data is already str / Unicode str. There's no need to convert it.
Putting it all together, your code will look like:
from backports import csv
import io
datos = [self.DB_HOST, self.DB_USER, self.DB_PASS, self.DB_NAME]
conn = MySQLdb.connect(*datos, charset='utf8', use_unicode=True)
cursor = conn.cursor()
cont = 0
with io.open('principal/management/commands/Historial_fichajes_jugadores.csv', 'r', encoding='utf-8') as csvfile:
historialReader = csv.reader(csvfile, delimiter=',')
for row in historialReader:
if cont == 0:
cont += 1
else:
cursor.execute(u'''INSERT INTO principal_historial(jugador_id, temporada, fecha, ultimoClub, nuevoClub, valor, coste) VALUES (%s,%s,%s,%s,%s,%s,%s)''',
round(float(row[1]))+1,row[2], self.stringToDate(row[3]), row[4], row[5], self.convertValue(row[6]), row[7]))
conn.commit()
cursor.close()
conn.close()
You may also want to look at https://stackoverflow.com/a/35444608/1554386, which covers what Python 2.7 Unicodes are.
I am using python 3.5 on both windows and Linux but get the same error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 0: ordinal not in range(128)
The error log is the following:
Reloaded modules: lazylinker_ext
Traceback (most recent call last):
File "<ipython-input-2-d60a2349532e>", line 1, in <module>
runfile('C:/Users/YZC/Google Drive/sunday/data/RA/data_20100101_20150622/w2v_coherence.py', wdir='C:/Users/YZC/Google Drive/sunday/data/RA/data_20100101_20150622')
File "C:\Users\YZC\Anaconda3\lib\site- packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\YZC\Anaconda3\lib\site- packages\spyderlib\widgets\externalshell\sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/YZC/Google Drive/sunday/data/RA/data_20100101_20150622/w2v_coherence.py", line 70, in <module>
model = gensim.models.Word2Vec.load('model_all_no_lemma')
File "C:\Users\YZC\Anaconda3\lib\site-packages\gensim\models\word2vec.py", line 1485, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Users\YZC\Anaconda3\lib\site-packages\gensim\utils.py", line 248, in load
obj = unpickle(fname)
File "C:\Users\YZC\Anaconda3\lib\site-packages\gensim\utils.py", line 912, in unpickle
return _pickle.loads(f.read())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 0: ordinal not in range(128)
1.I checked and found the default decode method is utf-8 by:
import sys
sys.getdefaultencoding()
Out[2]: 'utf-8'
when read the file, I also added .decode('utf-8')
I did add shepang line in the beginning and declare utf-8
so I really dont know why python couldnt read the file. Can anybody help me out?
Here are the code:
# -*- coding: utf-8 -*-
import gensim
import csv
import numpy as np
import math
import string
from nltk.corpus import stopwords, wordnet
from nltk.stem import WordNetLemmatizer
from textblob import TextBlob, Word
class SpeechParser(object):
def __init__(self, filename):
self.filename = filename
self.lemmatize = WordNetLemmatizer().lemmatize
self.cached_stopwords = stopwords.words('english')
def __iter__(self):
with open(self.filename, 'rb', encoding='utf-8') as csvfile:
file_reader = csv.reader(csvfile, delimiter=',', quotechar='|', )
headers = file_reader.next()
for row in file_reader:
parsed_row = self.parse_speech(row[-2])
yield parsed_row
def parse_speech(self, row):
speech_words = row.replace('\r\n', ' ').strip().lower().translate(None, string.punctuation).decode('utf-8', 'ignore')
return speech_words.split()
# -- source: https://github.com/prateekpg2455/U.S-Presidential- Speeches/blob/master/speech.py --
def pos(self, tag):
if tag.startswith('J'):
return wordnet.ADJ
elif tag.startswith('V'):
return wordnet.VERB
elif tag.startswith('N'):
return wordnet.NOUN
elif tag.startswith('R'):
return wordnet.ADV
else:
return ''
if __name__ == '__main__':
# instantiate object
sentences = SpeechParser("sample.csv")
# load an existing model
model = gensim.models.Word2Vec.load('model_all_no_lemma')
print('\n-----------------------------------------------------------')
print('MODEL:\t{0}'.format(model))
vocab = model.vocab
# print log-probability of first 10 sentences
row_count = 0
print('\n------------- Scores for first 10 documents: -------------')
for doc in sentences:
print(sum(model.score(doc))/len(doc))
row_count += 1
if row_count > 10:
break
print('\n-----------------------------------------------------------')
It looks like a bug in Gensim when you try to use a Python 2 pickle file that has non-ASCII chars in it with Python 3.
The unpickle is happening when you call:
model = gensim.models.Word2Vec.load('model_all_no_lemma')
In Python 3, during the unpickle it wants to convert legacy byte strings to (Unicode) strings. The default action is to decode with 'ASCII' in strict mode.
The fix will be dependant on the encoding in your original pickle file and will require you to patch the gensim code.
I'm not familiar with gensim so you will have to try the following two options:
Force UTF-8
Chances are, your non-ASCII data is in UTF-8 format.
Edit C:\Users\YZC\Anaconda3\lib\site-packages\gensim\utils.py
Goto line 912
Change line to read:
return _pickle.loads(f.read(), encoding='utf-8')
Byte mode
Gensim in Python3 may happily work with byte strings:
Edit C:\Users\YZC\Anaconda3\lib\site-packages\gensim\utils.py
Goto line 912
Change line to read:
return _pickle.loads(f.read(), encoding='bytes')
I am trying to use Python3.4 and boto3 to walk an S3 bucket and publish some file locations to an RDS instance. The part of this effort I am having trouble with is when using boto3. My lambda function looks like the following:
import subprocess
def lambda_handler(event, context):
args = ("venv/bin/python3.4", "run.py")
popen = subprocess.Popen(args, stdout=subprocess.PIPE)
popen.wait()
output = popen.stdout.read()
print(output)
and, in my run.py file I have some lines:
import boto3
s3c = boto3.client('s3')
which cause an exception. The run.py file is not relevant for this question however, so in order make this post more concise, I've found that the cause of this error is generated with executing the lambda function:
import subprocess
def lambda_handler(event, context):
args = ("python3.4", "-c", "import boto3; print(boto3.client('s3'))")
popen = subprocess.Popen(args, stdout=subprocess.PIPE)
popen.wait()
output = popen.stdout.read()
print(output)
My logstream reports the error:
Event Data
START RequestId: 2b65421a-664d-11e6-81db-974c7c09d283 Version: $LATEST
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/var/runtime/boto3/__init__.py", line 79, in client
return _get_default_session().client(*args, **kwargs)
File "/var/runtime/boto3/session.py", line 250, in client
aws_session_token=aws_session_token, config=config)
File "/var/runtime/botocore/session.py", line 818, in create_client
client_config=config, api_version=api_version)
File "/var/runtime/botocore/client.py", line 63, in create_client
cls = self._create_client_class(service_name, service_model)
File "/var/runtime/botocore/client.py", line 85, in _create_client_class
base_classes=bases)
File "/var/runtime/botocore/hooks.py", line 227, in emit
return self._emit(event_name, kwargs)
File "/var/runtime/botocore/hooks.py", line 210, in _emit
response = handler(**kwargs)
File "/var/runtime/boto3/utils.py", line 61, in _handler
module = import_module(module)
File "/var/runtime/boto3/utils.py", line 52, in import_module
__import__(name)
File "/var/runtime/boto3/s3/inject.py", line 13, in <module>
from boto3.s3.transfer import S3Transfer
File "/var/runtime/boto3/s3/transfer.py", line 135, in <module>
from concurrent import futures
File "/var/runtime/concurrent/futures/__init__.py", line 8, in <module>
from concurrent.futures._base import (FIRST_COMPLETED,
File "/var/runtime/concurrent/futures/_base.py", line 357
raise type(self._exception), self._exception, self._traceback
^
SyntaxError: invalid syntax
END RequestId: 2b65421a-664d-11e6-81db-974c7c09d283
REPORT RequestId: 2b65421a-664d-11e6-81db-974c7c09d283 Duration: 2673.45 ms Billed Duration: 2700 ms Memory Size: 1024 MB Max Memory Used: 61 MB
I need to use boto3 downstream of run.py. Any ideas on how to resolve this are much appreciated. Thanks!
I have unicode data as read from this file:
Mdt,Doccompra,OrgC,Cen,NumP,Criadopor,Dtcriacao,Fornecedor,P,Fun
400,8751215432,2581,,1,MIGRAÇÃO,01.10.2004,75852214,,TD
400,5464282154,9874,,1,MIGRAÇÃO,01.10.2004,78995411,,FO
I have two problems:
When I try to query this unicode data I get a UnicodeDecodeError:
Traceback (most recent call last):
File "<ipython-input-1-4423dceb2b1d>", line 1, in <module>
runfile('C:/Users/u5en/Documents/SAP/Programação/Problema HDF.py', wdir='C:/Users/u5en/Documents/SAP/Programação')
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 48, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/u5en/Documents/SAP/Programação/Problema HDF.py", line 15, in <module>
store.select("EKKA", "columns=['Mdt', 'Fornecedor']")
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 665, in select
return it.get_result()
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 1359, in get_result
results = self.func(self.start, self.stop, where)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 658, in func
columns=columns, **kwargs)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 3968, in read
if not self.read_axes(where=where, **kwargs):
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 3201, in read_axes
a.convert(values, nan_rep=self.nan_rep, encoding=self.encoding)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 2058, in convert
self.data, nan_rep=nan_rep, encoding=encoding)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 4359, in _unconvert_string_array
data = f(data)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 1700, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 1769, in _vectorize_call
outputs = ufunc(*inputs)
File "C:\Users\u5en\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 4358, in <lambda>
f = np.vectorize(lambda x: x.decode(encoding), otypes=[np.object])
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 7: unexpected end of data
How can I store and query my unicode data in hdf5?
I have many tables with column names I do not know beforehand and which are not proper pytable names (NaturalNameWarning). I would like the user to be able to query on this columns, so I wonder how could I query these when their name prevents me? I see this used to have no easy fix, so if that is still the case I will just remove the offending characters from the heading.
import csv
import pandas as pd
dados = pd.read_csv("EKKA - Cópia.csv")
print(dados)
store= pd.HDFStore('teste.h5' , encoding="utf-8")
store.append("EKKA", dados, format="table", data_columns=True)
store.select("EKKA", "columns=['Mdt', 'Fornecedor']")
store.close()
Would I be better off doing this in sqlite?
Environment:
Windows 7 64bit
Pandas 15.2
NumPy 1.9.2
So under Python 2.7 on Windows 7, pandas 0.15.2, everything worked as expected, no encoding necessary. However on Python 3.4, the following worked for me. Apparently some characters are not representable in 'utf-8'; 'latin1' encoding usually solves these issues. Note that I had to read the csv in the first place with this encoding.
>>> df = pd.read_csv('../../test.csv',encoding='latin1')
>>> df
Mdt Doccompra OrgC Cen NumP Criadopor Dtcriacao Fornecedor P Fun
0 400 8751215432 2581 NaN 1 MIGRAÇ\xc3O 01.10.2004 75852214 NaN TD
1 400 5464282154 9874 NaN 1 MIGRAÇ\xc3O 01.10.2004 78995411 NaN FO
Further, the encoding must be specified not when opening the store, but on the append/put calls
>>> df.to_hdf('test.h5','df',format='table',mode='w',data_columns=True,encoding='latin1')
>>> pd.read_hdf('test.h5','df')
Mdt Doccompra OrgC Cen NumP Criadopor Dtcriacao Fornecedor P Fun
0 400 8751215432 2581 NaN 1 MIGRAÇ\xc3O 01.10.2004 75852214 NaN TD
1 400 5464282154 9874 NaN 1 MIGRAÇ\xc3O 01.10.2004 78995411 NaN FO
Once it is written encoded, it is not necessary to specify the encoding when reading.