Google Storage: Invalid Unicode path encountered

Google Storage: Invalid Unicode path encountered - google-cloud-storage

I'm trying to upload some files to GCS and i get this:
Building synchronization state...
Caught non-retryable exception while listing file:///media/Respaldo: CommandExce ption: Invalid Unicode path encountered
(u'/media/Respaldo/Documentos/Trabajo/Traducciones/Servicio
Preventivo Semanal Hs Rev3 - Ingl\xe9s.doc'). gsutil cannot
proceed with such files present. Please remove or rename this file and
try again. NOTE: the path printed above replaces the problematic
characters with a hex-encoded printable representation. For more
details (including how to convert to a gsutil-compatible encoding) see
`gsutil help encoding`.
But when i run:
convmv -f ISO-8859-1 -t UTF-8 -r --replace /media/Respaldo
And says all the non English files are already UTF-8. How should I proceed?
Edit: example of convmv output:
Skipping, already UTF-8: /media/Respaldo/Multimedia/Mis ImÃ¡genes/NOKIA/Memoria/Videoclips/VÃdeo004.3gp
Skipping, already UTF-8: /media/Respaldo/Multimedia/Mis ImÃ¡genes/NOKIA/Memoria/Videoclips/VÃdeo009.3gp
Skipping, already UTF-8: /media/Respaldo/Multimedia/Mis ImÃ¡genes/NOKIA/Memoria/Videoclips/VÃdeo00133.3gp
Skipping, already UTF-8: /media/Respaldo/Multimedia/Mis ImÃ¡genes/NOKIA/Memoria/Videoclips/VÃdeo023.3gp
Skipping, already UTF-8: /media/Respaldo/Multimedia/Mis ImÃ¡genes/NOKIA/Memoria/Videoclips/VÃdeo026.3gp

Related

Does the postgres COPY function support utf 16 encoded files?

I am trying to use the postgreSQL COPY function to insert a UTF 16 encoded csv into a table. However, when running the below query:
COPY temp
FROM 'C:\folder\some_file.csv'
WITH (
DELIMITER E'\t',
FORMAT csv,
HEADER);
I get the error below:
ERROR: invalid byte sequence for encoding "UTF8": 0xff
CONTEXT: COPY temp, line 1
SQL state: 22021
and when I run the same query, but adding the encoding settings Encoding 'UTF-16' or Encoding 'UTF 16' to the with block, I get the error below:
ERROR: argument to option "encoding" must be a valid encoding name
LINE 13: ENCODING 'UTF 16' );
^
SQL state: 22023
Character: 377
I've looked through the postgres documentation to try to find the correct encoding, but haven't managed to find anything. Is this because the copy function does support UTF 16 encoded files? I would have thought that this would almost certainly have been possible!
I'm running postgres 12, on windows 10 pro
Any help would be hugely appreciated!

No, you cannot do that.
UTF-16 is not in the list of supported encodings.
PostgreSQL will never support an encoding that is not an extension of ASCII.
You will have to convert the file to UTF-8.

How to fix "the octet sequence #(130) cannot be decoded." in pgloader

I'm trying to migrate a database from sqlite to postgresql using pgloader.
My sqlite db is data.db, so i try this
pgloader ./var/data.db postgres://***#ec2-54-83-50-174.compute-1.amazonaws.com:5432/mydb?sslmode=require
Output:
pgloader version 3.6.1
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/var/folders/65/x6spw10s4jgd3qkhdq96bk8c0000gn/T/"
KABOOM!
2019-04-11T19:22:47.022000+01:00 NOTICE Starting pgloader, log system is ready.
FATAL error: :UTF-8 stream decoding error on #<SB-SYS:FD-STREAM for "file /Users/mackbookpro/Desktop/dev/www/Beyti/var/data.db" {1005892A93}>: the octet sequence #(130) cannot be decoded.
Date/time: 2019-04-11-18:22An unhandled error condition has been signalled: :UTF-8 stream decoding error on #<SB-SYS:FD-STREAM for "file /Users/mackbookpro/Desktop/dev/www/Beyti/var/data.db" {1005892A93}>: the octet sequence #(130) cannot be decoded.
An idea about this problem? thank you in advance

This is a character encoding issue.
The culprit "octet sequence #(130)" corresponded to "é" in my case, which was encoded as \x82.
iconv failed.
I replaced in the byte stream those corrupted \x82 with \x65 (ascii char "e"), and I got out of it.
<bad_file xxd -c1 -p | sed s/82/65/ | xxd -r -p > good_new_file
(cheers to Natacha on irc freenode #gcu :) )
Edit : French issues? same problem with #133 "à", same solution \x85 -> \x61
Edit 2 : A little generalization I just found :
The "octet sequence" pgloader refers to, is the decimal ranking of the ascii table. When you get higher than 127 in the "octet sequence", you step in the extended ascii table and generate errors.
I just got an issue with #144? It is \x90. replace :)

PostgreSQL - COPY FROM - "unacceptable encoding" error

I get an error when trying to use copy utility to extract data from csv file with UCS-2 LE BOM encoding (as reported by notepad++).
COPY pub.calls (............ )
FROM 'c:\IMPORT\calls.csv'
WITH
DELIMITER ','
HEADER
CSV
ENCODING 'UCS2';
The error is something like this
SQL Error [22023]Error The argument of encoding parameter should be
acceptable encoding name.
UCS-2 gives the same error.

For the list of supported charsets:
https://www.postgresql.org/docs/current/static/multibyte.html
or in psql type \encoding and dbl tab for autocomplete:
postgres=# \encoding
BIG5 EUC_JP GB18030 ISO_8859_6 JOHAB LATIN1 LATIN3 LATIN6 LATIN9 SJIS UTF8 WIN1252 WIN1255 WIN1258
EUC_CN EUC_KR GBK ISO_8859_7 KOI8R LATIN10 LATIN4 LATIN7 MULE_INTERNAL SQL_ASCII WIN1250 WIN1253 WIN1256 WIN866
EUC_JIS_2004 EUC_TW ISO_8859_5 ISO_8859_8 KOI8U LATIN2 LATIN5 LATIN8 SHIFT_JIS_2004 UHC WIN1251 WIN1254 WIN1257 WIN874

Cannot access FTP directory with CP1250/CP852/UTF-8 encoding

I am trying to read in some files from the following directory structure:
/jc/06 Önéletrajzok/Profession/Előszűrés sablonok név szerint
But for some strange reason I cannot enter not even in the upper level directories.
I already tried with PHP/Python3.6/Ruby but without much luck. At least with PHP and Python I can CWD() at least until the /jc/06 Önéletrajzok/Profession part.
Here is my python code for reference:
from ftplib import FTP
ftp = FTP('hostname')
ftp.login('username','pwd')
ftp.cwd('jc') # Just for demonstration purposes as step by step
ftp.cwd('06 Önéletrajzok')
ftp.cwd('Profession')
print(ftp.nlst()[2]) # Which gives: 'ElÅ\x91szÅ±rÃ©s sablonok nÃ©v szerint
# But when I am trying:
ftp.cwd('ElÅ\x91szÅ±rÃ©s sablonok nÃ©v szerint')
# Or either:
ftp.cwd('Előszűrés sablonok név szerint')
# It gives:
# UnicodeEncodeError: 'latin-1' codec can't encode character '\u0151' in position 6: ordinal not in range(256)
# So I am trying encoding CP1250 or CP852 (for Hungarian)
dir = 'Előszűrés sablonok név szerint'.encode('cp852') # which gives: b'El\x8bsz\xfbr\x82s sablonok n\x82v szerint'
ftp.cwd(dir.decode('utf-8'))
# and it gives the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 2: invalid start byte
So I am starting to give up on this one, I don't know how to access those files. The directory structure was created with Windows laptops accessing a Synology File server.
I have already tried with ftp.encoding = "utf-8" too.
Any ideas?

PostgreSql Russian dict gor fulltext search

I tried to add Russian dictionary for fulltext search in postgresql db. I' ve downloaded dict files, converted them to UTF-8 and tried to create new dict
$ iconv -f koi8-r -t utf-8 < ru_RU.aff > /opt/local/share/postgresql93/tsearch_data/russian.affix
$ iconv -f koi8-r -t utf-8 < ru_RU.dic > /opt/local/share/postgresql93/tsearch_data/russian.dict
CREATE TEXT SEARCH DICTIONARY russian_ispell (
TEMPLATE = ispell,
DictFile = russian,
AffFile = russian,
StopWords = russian
);
But got an ERROR:
ERROR: invalid byte sequence for encoding "UTF8": 0xd1
CONTEXT: line 341 of configuration file "/opt/local/share/postgresql93/tsearch_data/russian.affix": "SFX Y хаться шутся хаться"
Then tried with other Russian dicts, but the same error occurred. How can I handle with this error? Thanks.

You can try to execute the following command:
export LC_ALL=C
I think you have a locale issue. This command should be executed in the same command line session where you execute the command for creating dictionary.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Google Storage: Invalid Unicode path encountered - google-cloud-storage

Related

Does the postgres COPY function support utf 16 encoded files?

How to fix "the octet sequence #(130) cannot be decoded." in pgloader

PostgreSQL - COPY FROM - "unacceptable encoding" error

Cannot access FTP directory with CP1250/CP852/UTF-8 encoding

PostgreSql Russian dict gor fulltext search

Categories

Resources