Problems with MySQL encoding - encoding

I have a serious problem with my populate. Characters are not stored correctly. My code:
def _create_Historial(self):
datos = [self.DB_HOST, self.DB_USER, self.DB_PASS, self.DB_NAME]
conn = MySQLdb.connect(*datos)
cursor = conn.cursor()
cont = 0
with open('principal/management/commands/Historial_fichajes_jugadores.csv', 'rv') as csvfile:
historialReader = csv.reader(csvfile, delimiter=',')
for row in historialReader:
if cont == 0:
cont += 1
else:
#unicodedata.normalize('NFKD', unicode(row[4], 'latin1')).encode('ASCII', 'ignore'),
cursor.execute('''INSERT INTO principal_historial(jugador_id, temporada, fecha, ultimoClub, nuevoClub, valor, coste) VALUES (%s,%s,%s,%s,%s,%s,%s)''',
(round(float(row[1]))+1,row[2], self.stringToDate(row[3]), unicode(row[4],'utf-8'), row[5], self.convertValue(row[6]), str(row[7])))
conn.commit()
cursor.close()
conn.close()
El error es el siguiente:
Traceback (most recent call last):
File "/home/tfg/pycharm-2016.3.2/helpers/pycharm/django_manage.py", line 41, in <module>
run_module(manage_file, None, '__main__', True)
File "/usr/lib/python2.7/runpy.py", line 188, in run_module
fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 82, in _run_module_code
mod_name, mod_fname, mod_loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/tfg/TrabajoFinGrado/demoTFG/manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python2.7/dist- packages/django/core/management/__init__.py", line 443, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python2.7/dist -packages/django/core/management/__init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 196, in run_from_argv
self.execute(*args, **options.__dict__)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 232, in execute
output = self.handle(*args, **options)
File "/home/tfg/TrabajoFinGrado/demoTFG/principal/management/commands/populate_db.py", line 230, in handle
self._create_Historial()
File "/home/tfg/TrabajoFinGrado/demoTFG/principal/management/commands/populate_db.py", line 217, in _create_Historial
(round(float(row[1]))+1,row[2], self.stringToDate(row[3]), unicode(row[4],'utf-8'), row[5], self.convertValue(row[6]), str(row[7])))
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 187, in execute
query = query % tuple([db.literal(item) for item in args])
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/connections.py", line 278, in literal
return self.escape(o, self.encoders)
File "/usr/local/lib/python2.7/dist-packages/MySQLdb/connections.py", line 208, in unicode_literal
return db.literal(u.encode(unicode_literal.charset))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 6-7: ordinal not in range(256)
The characters was shownn as follows: Nicolás Otamendi, Gaël Clichy ....
When I print the characteros on shell of the python, its wah shown correctly.
Sorry for my english :(

Ok, I'll keep this brief.
You should convert encoded data/strs to Unicodes early in your code. Don't inline .decode()/.encode()/unicode()
When you open a file in Python 2.7, it's opened in binary mode. You should use io.open(filename, encoding='utf-8'), which will read it as text and decode it from utf-8 to Unicodes.
The Python 2.7 CSV module is not Unicode compatible. You should install https://github.com/ryanhiebert/backports.csv
You need to tell the MySQL driver that you're going to pass Unicodes and use UTF-8 for the connection. This is done by adding the following to your connection string:
charset='utf8',
use_unicode=True
Pass Unicode strings to MySQL. Use the u'' prefix to avoid troublesome implied conversion.
All your CSV data is already str / Unicode str. There's no need to convert it.
Putting it all together, your code will look like:
from backports import csv
import io
datos = [self.DB_HOST, self.DB_USER, self.DB_PASS, self.DB_NAME]
conn = MySQLdb.connect(*datos, charset='utf8', use_unicode=True)
cursor = conn.cursor()
cont = 0
with io.open('principal/management/commands/Historial_fichajes_jugadores.csv', 'r', encoding='utf-8') as csvfile:
historialReader = csv.reader(csvfile, delimiter=',')
for row in historialReader:
if cont == 0:
cont += 1
else:
cursor.execute(u'''INSERT INTO principal_historial(jugador_id, temporada, fecha, ultimoClub, nuevoClub, valor, coste) VALUES (%s,%s,%s,%s,%s,%s,%s)''',
round(float(row[1]))+1,row[2], self.stringToDate(row[3]), row[4], row[5], self.convertValue(row[6]), row[7]))
conn.commit()
cursor.close()
conn.close()
You may also want to look at https://stackoverflow.com/a/35444608/1554386, which covers what Python 2.7 Unicodes are.

Related

psycopg2.errors.UndefinedColumn: column website.sequence does not exist

I got the above error when I restarted the odoo server in docker when a user from our team was making changes to an odoo module. afterwards I was not able to restart odoo.
when i Use \d website the column sequence don't exist in the table.
and when il add the column manualy i have another error for another table.
File "/usr/lib/python3/dist-packages/odoo/tools/cache.py", line 90, in lookup
value = d[key] = self.method(*args, **kwargs)
File "/usr/lib/python3/dist-packages/odoo/addons/website/models/website.py", line 987, in _get_current_website_id
found_websites = self.search([('domain', 'ilike', _remove_port(domain_name))]).sorted('country_group_ids')
File "/usr/lib/python3/dist-packages/odoo/models.py", line 1811, in search
return res if count else self.browse(res)
File "/usr/lib/python3/dist-packages/odoo/models.py", line 5144, in browse
if not ids:
File "/usr/lib/python3/dist-packages/odoo/osv/query.py", line 215, in __bool__
return bool(self._result)
File "/usr/lib/python3/dist-packages/odoo/tools/func.py", line 26, in __get__
value = self.fget(obj)
File "/usr/lib/python3/dist-packages/odoo/osv/query.py", line 208, in _result
self._cr.execute(query_str, params)
File "<decorator-gen-3>", line 2, in execute
File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 89, in check
return f(self, *args, **kwargs)
File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 310, in execute
res = self._obj.execute(query, params)
psycopg2.errors.UndefinedColumn: column website.sequence does not exist
LINE 1: ...."domain"::text ilike '%#%') ORDER BY "website"....

How can I add JPEG image with non-UTF characters into a docx using python-docx?

I am trying to insert a series of JPEGs into a Word document using python-docx but it seems that some of them may have non-UTF-8 metadata included, which is causing docx to issue a Unicode decoding error message. How can I get around this?
Here is the code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from docx import Document
from docx.shared import Inches
from docx.enum.table import *
from docx.enum.text import WD_ALIGN_PARAGRAPH
from PIL import Image
from PIL.ExifTags import TAGS
document = Document()
table = document.add_table(rows=1, cols=1)
table.alignment = WD_TABLE_ALIGNMENT.CENTER
row_cells = table.add_row().cells
paragraph = row_cells[0].paragraphs[0]
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
run = paragraph.add_run()
run.add_picture("143269.jpg", height=Inches(5))
document.save('demo.docx')
and the error traceback
runfile('/Users/fred/bin/nimble/dvids/picimporttest.py', wdir='/Users/fred/bin/nimble/dvids')
Traceback (most recent call last):
File "/Users/fred/bin/nimble/dvids/picimporttest.py", line 26, in <module>
run.add_picture("143269.jpg", height=Inches(5))
File "/usr/local/lib/python3.8/site-packages/docx/text/run.py", line 62, in add_picture
inline = self.part.new_pic_inline(image_path_or_stream, width, height)
File "/usr/local/lib/python3.8/site-packages/docx/parts/story.py", line 56, in new_pic_inline
rId, image = self.get_or_add_image(image_descriptor)
File "/usr/local/lib/python3.8/site-packages/docx/parts/story.py", line 29, in get_or_add_image
image_part = self._package.get_or_add_image_part(image_descriptor)
File "/usr/local/lib/python3.8/site-packages/docx/package.py", line 31, in get_or_add_image_part
return self.image_parts.get_or_add_image_part(image_descriptor)
File "/usr/local/lib/python3.8/site-packages/docx/package.py", line 74, in get_or_add_image_part
image = Image.from_file(image_descriptor)
File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 55, in from_file
return cls._from_stream(stream, blob, filename)
File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 176, in _from_stream
image_header = _ImageHeaderFactory(stream)
File "/usr/local/lib/python3.8/site-packages/docx/image/image.py", line 198, in _ImageHeaderFactory
return cls.from_stream(stream)
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 68, in from_stream
markers = _JfifMarkers.from_stream(stream)
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 111, in from_stream
for marker in marker_parser.iter_markers():
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 176, in iter_markers
marker = _MarkerFactory(
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 271, in _MarkerFactory
return marker_cls.from_stream(stream, marker_code, offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 413, in from_stream
tiff = cls._tiff_from_exif_segment(stream, offset, segment_length)
File "/usr/local/lib/python3.8/site-packages/docx/image/jpeg.py", line 455, in _tiff_from_exif_segment
return Tiff.from_stream(substream)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 36, in from_stream
parser = _TiffParser.parse(stream)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 63, in parse
ifd_entries = _IfdEntries.from_stream(stream_rdr, ifd0_offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 176, in from_stream
entries = dict((e.tag, e.value) for e in ifd_parser.iter_entries())
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 176, in <genexpr>
entries = dict((e.tag, e.value) for e in ifd_parser.iter_entries())
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 204, in iter_entries
ifd_entry = _IfdEntryFactory(self._stream_rdr, dir_entry_offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 231, in _IfdEntryFactory
return entry_cls.from_stream(stream_rdr, offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 255, in from_stream
value = cls._parse_value(
File "/usr/local/lib/python3.8/site-packages/docx/image/tiff.py", line 294, in _parse_value
return stream_rdr.read_str(value_count-1, value_offset)
File "/usr/local/lib/python3.8/site-packages/docx/image/helpers.py", line 71, in read_str
unicode_str = chars.decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 76: invalid start byte
and a sample problem jpeg.
Most jpgs work.

Write Function Result to XML on Powershell

I'm using a Template tool https://github.com/noahmorrison/chevron (Mustache Impl)
My issue is that when I run it on powershell and I try to save the result to a file doing:
chevron C:\templates\db.mustache -d C:\temp\db\dev.json | ConvertTo-Xml > .\Process.xml
Note the function prompts an XML
I get the error:
Traceback (most recent call last):
File "c:\users\kevin.haunstetter\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\kevin.haunstetter\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\kevin.haunstetter\AppData\Local\Programs\Python\Python38-32\Scripts\chevron.exe\__main__.py", line 7, in <module>
File "c:\users\kevin.haunstetter\appdata\local\programs\python\python38-32\lib\site-packages\chevron\main.py", line 84, in cli_main
sys.stdout.write(main(**args))
File "c:\users\kevin.haunstetter\appdata\local\programs\python\python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to <undefined>
Any idea?

"InterfaceError: connection already closed" when using multiprocessing.Pool on black box function that queries PostgreSQL database

I've been given a Python (2.7) function that takes 3 strings as arguments, and returns a list of dictionaries. Due to the nature of the project, I can't alter the function, which is quite complex, calling several other non-standard Python modules and querying a PostgreSQL database using psychopg2. I think that it's the Postgres functionality that's causing me problems.
I want to use the multiprocessing module to speed up calling the function hundreds of times. I've written a "helper" function so that I can use multiprocessing.Pool (which takes only 1 argument) with my function:
from function_script import function
def function_helper(args):
return function(*args)
And my main code looks like this:
from helper_script import function_helper
from multiprocessing import Pool
argument_a = ['a0', 'a1', ..., 'a99']
argument_b = ['b0', 'b1', ..., 'b99']
argument_c = ['c0', 'c1', ..., 'c99']
input = zip(argument_a, argument_b, argument_c)
p = Pool(4)
results = p.map(function_helper, input)
print results
What I'm expecting is a list of lists of dictionaries, however I get the following errors:
Traceback (most recent call last):
File "/local/python/2.7/lib/python2.7/site-packages/variantValidator/variantValidator.py", line 898, in validator
vr.validate(input_parses)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 33, in validate
return self._ivr.validate(var, strict) and self._evr.validate(var, strict)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 69, in validate
(res, msg) = self._ref_is_valid(var)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 89, in _ref_is_valid
var_x = self.vm.c_to_n(var) if var.type == "c" else var
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/variantmapper.py", line 223, in c_to_n
tm = self._fetch_TranscriptMapper(tx_ac=var_c.ac, alt_ac=var_c.ac, alt_aln_method="transcript")
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/decorators/lru_cache.py", line 176, in wrapper
result = user_function(*args, **kwds)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/variantmapper.py", line 372, in _fetch_TranscriptMapper
self.hdp, tx_ac=tx_ac, alt_ac=alt_ac, alt_aln_method=alt_aln_method)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/transcriptmapper.py", line 69, in __init__
self.tx_identity_info = hdp.get_tx_identity_info(self.tx_ac)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/decorators/lru_cache.py", line 176, in wrapper
result = user_function(*args, **kwds)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 353, in get_tx_identity_info
rows = self._fetchall(self._queries['tx_identity_info'], [tx_ac])
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 216, in _fetchall
with self._get_cursor() as cur:
File "/local/python/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 529, in _get_cursor
cur.execute("set search_path = " + self.url.schema + ";")
File "/local/python/2.7/lib/python2.7/site-packages/psycopg2/extras.py", line 144, in execute
return super(DictCursor, self).execute(query, vars)
DatabaseError: SSL error: decryption failed or bad record mac
And:
Traceback (most recent call last):
File "/local/python/2.7/lib/python2.7/site-packages/variantValidator/variantValidator.py", line 898, in validator
vr.validate(input_parses)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 33, in validate
return self._ivr.validate(var, strict) and self._evr.validate(var, strict)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 69, in validate
(res, msg) = self._ref_is_valid(var)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/validator.py", line 89, in _ref_is_valid
var_x = self.vm.c_to_n(var) if var.type == "c" else var
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/variantmapper.py", line 223, in c_to_n
tm = self._fetch_TranscriptMapper(tx_ac=var_c.ac, alt_ac=var_c.ac, alt_aln_method="transcript")
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/decorators/lru_cache.py", line 176, in wrapper
result = user_function(*args, **kwds)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/variantmapper.py", line 372, in _fetch_TranscriptMapper
self.hdp, tx_ac=tx_ac, alt_ac=alt_ac, alt_aln_method=alt_aln_method)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/transcriptmapper.py", line 69, in __init__
self.tx_identity_info = hdp.get_tx_identity_info(self.tx_ac)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/decorators/lru_cache.py", line 176, in wrapper
result = user_function(*args, **kwds)
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 353, in get_tx_identity_info
rows = self._fetchall(self._queries['tx_identity_info'], [tx_ac])
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 216, in _fetchall
with self._get_cursor() as cur:
File "/local/python/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/local/python/2.7/lib/python2.7/site-packages/hgvs/dataproviders/uta.py", line 526, in _get_cursor
conn.autocommit = True
InterfaceError: connection already closed
Does anybody know what might cause the Pool function to behave like this, when it seems so simple to use in other examples that I've tried? If this isn't enough information to go on, can anyone advise me on a way of getting to the bottom of the problem (this is the first time I've worked with someone else's code)? Alternatively, are there any other ways that I could use the multiprocessing module to call the function hundreds of times?
Thanks
I think what may be happening is that your connection object is used across all workers and when 1 worker has completed all its tasks it closes the connection and meanwhile the other workers are still working and the connection is closed so when one of those workers tries to use the db it is already closed.

Basic Grako example gives IndexError

I'd like to get started with Grako (3.6.6) and as a first experience with parsers I wanted to generate an HTML table from a custom syntax. The following basic test
import grako
grammar = """table = { row }+ ;
row = (cell1:cell "|" cell2:cell) "\n";
cell = /[a-z]+/ ;
"""
model = grako.genmodel("model", grammar)
ast = model.parse(
"""a | b
c | d
""", "table")
print(ast)
results in an error
File "test.py", line 13, in <module>
""", "table")
File "grako\grammars.py", line 790, in grako.grammars.Grammar.parse (grako\grammars.c:27773)
File "grako\grammars.py", line 97, in grako.grammars.GrakoContext.parse (grako\grammars.c:4391)
File "grako\contexts.py", line 180, in grako.contexts.ParseContext.parse (grako\contexts.c:4313)
File "grako\grammars.py", line 594, in grako.grammars.Rule.parse (grako\grammars.c:22253)
File "grako\grammars.py", line 597, in grako.grammars.Rule._parse_rhs (grako\grammars.c:22435)
File "grako\contexts.py", line 399, in grako.contexts.ParseContext._call (grako\contexts.c:10088)
File "grako\contexts.py", line 433, in grako.contexts.ParseContext._invoke_rule (grako\contexts.c:11135)
File "grako\grammars.py", line 435, in grako.grammars.PositiveClosure.parse (grako\grammars.c:17285)
File "grako\contexts.py", line 695, in grako.contexts.ParseContext._positive_closure (grako\contexts.c:19286)
File "grako\contexts.py", line 696, in grako.contexts.ParseContext._positive_closure (grako\contexts.c:19240)
File "grako\grammars.py", line 435, in grako.grammars.PositiveClosure.parse.lambda10 (grako\grammars.c:17195)
File "grako\grammars.py", line 547, in grako.grammars.RuleRef.parse (grako\grammars.c:20774)
File "grako\grammars.py", line 594, in grako.grammars.Rule.parse (grako\grammars.c:22253)
File "grako\grammars.py", line 597, in grako.grammars.Rule._parse_rhs (grako\grammars.c:22435)
File "grako\contexts.py", line 399, in grako.contexts.ParseContext._call (grako\contexts.c:10088)
File "grako\contexts.py", line 433, in grako.contexts.ParseContext._invoke_rule (grako\contexts.c:11135)
File "grako\grammars.py", line 326, in grako.grammars.Sequence.parse (grako\grammars.c:11582)
File "grako\grammars.py", line 268, in grako.grammars.Token.parse (grako\grammars.c:9463)
File "grako\contexts.py", line 543, in grako.contexts.ParseContext._token (grako\contexts.c:13772)
File "grako\buffering.py", line 301, in grako.buffering.Buffer.match (grako\buffering.c:9168)
IndexError: string index out of range
which happens to be partial_match = (token[0].isalpha() and token.isalnum() and self.is_name_char(self.current()) )
Despite me being new to parsers and a little lack of documentation, I'd like to stick to Grako.
Can you help me set up a basic example which outputs the HTML for a table?
Grako is not seeing the "\n" in the grammar correctly because newlines are not allowed in tokens, and the \n is being evaluated in the context of the outer, triple-quote ("""), string. Things work fine if you use /\n/ instead.
Also note that if \n will be part of the language, then you should probably write a ##whitespace clause so the parser doesn't skip over the character:
##whitespace :: /[\t ]+/
This is the correct grammar for your language:
grammar = """
##whitespace :: /[\t ]+/
table = { row }+ ;
row = (cell1:cell "|" cell2:cell) "\\n";
cell = /[a-z]+/ ;
"""
I'm currently patching Grako to detect and report errors like the one in your grammar. The changes are already in the Bitbucket repository. I'll make a release after I finish testing.