ora2pg (Unicode surrogate U+DBC0 is illegal in UTF-8)

ora2pg (Unicode surrogate U+DBC0 is illegal in UTF-8) - postgresql

I am extracting data using ora2pg from Oracle and get the below warning.
Unicode surrogate U+DBC0 is illegal in UTF-8 at /usr/lib64/perl5/IO/Handle.pm line 420.721 (1800 recs/sec)
Unicode surrogate U+DF72 is illegal in UTF-8 at /usr/lib64/perl5/IO/Handle.pm line 420.
Unicode surrogate U+DBC0 is illegal in UTF-8 at /usr/lib64/perl5/IO/Handle.pm line 420.
Unicode surrogate U+DF72 is illegal in UTF-8 at /usr/lib64/perl5/IO/Handle.pm line 420.
Unicode surrogate U+DBC0 is illegal in UTF-8 at /usr/lib64/perl5/IO/Handle.pm line 420.721 (1802 recs/sec)
and upon import i get this error.
FATAL: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0x80
I've imported it directly using PG_DSN and not to a file but that didn't help either.
[========================>] 1/1 tables (100.0%) end of scanning.
DBD::Pg::db pg_putcopyend failed: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0x80cs/sec)
CONTEXT: COPY phy_t1, line 1131 at /var/lib/pgsql/PERL_DBI_DBD/lib64/perl5/Ora2Pg.pm line 14716.
FATAL: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0x80
When I use "iconv -f utf-16le -t UTF-8//TRANSLIT out.sql -o PHYCON_data.sql" I get "out of memory" error. I've even bumped work_mem from 4M to 200M but still get the below.
Any suggestions?

Related

Buildroot mtools "codepage 850 Invalid argument"

I have enabled BR2_TOOLCHAIN_GLIBC_GCONV_LIBS_COPY
But mtools still give me this error:
Error converting to codepage 850 Invalid argument
Error setting code page
Cannot initialize 'A:'
why?

unoconv with --stdin not working

I am using unoconv to convert docx to pdf. All works great as long as I am passing the document via file name:
$ unoconv -f pdf --stdout test.docx
But as soon as I am using --stdin it doesn't work anymore:
$ unoconv -f pdf --stdin --stdout < test.docx
Traceback (most recent call last):
File "/usr/bin/unoconv", line 1275, in <module>
main()
File "/usr/bin/unoconv", line 1185, in main
inputfn = sys.stdin.read()
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 41: invalid start byte
It's the same file. Why doesn't this work?
Here is the file: https://nofile.io/f/bKz1zWf745K/test.docx

I think the problem is that the --stdin option doesn't do what one probably thinks it does.
In the error message, the variable name in line 1185 looks suspicious:
inputfn = sys.stdin.read()
And indeed, checking the source code, it seems that the text read from STDIN is interpreted as the file name, not the file content.
However, the documentation (man unoconv) is misleading:
--stdin
Read input file from stdin (filenames are ignored if provided)
This really doesn't sound like interpreting the input as file name.
I suggest that you file a bug report about this (maybe first check if there is one already).

Unicode -- Copyright Symbol

I'm trying to represent the copyright symbol © in Python.
If I type © into python interactive terminal I get '\xc2\xa9'. This is 169 and 194 in hexadecimal.
But if I look up the copyright symbol in the unicode table it's only 169.
Python Interactive Terminal:
ord(u"©") --> 169
However '\xa9' == "©" --> False
Only '\xc2\xa9' == "©" --> True
I don't really get why 169 194 together gives copyright instead of just 169 or just 194.

Your terminal supports UTF-8 encoding, and you are likely using Python 2:
>>> import sys
>>> sys.stdout.encoding
'utf-8'
>>> '©'
'\xc2\xa9'
>>> u'©'
u'\xa9'
Python 2 uses byte strings and characters are encoded in the terminal's encoding. Use a Unicode string to get the Unicode value.

Doxygen "todo:86: warning: Unexpected character

I'm having problems with Doxygen giving me warning messages such as these:
todo:86: warning: Unexpected character `d'
todo:86: warning: Unexpected character `o'
todo:86: warning: Unexpected character `c'
todo:86: warning: Unexpected character `/'
todo:86: warning: Unexpected character `t'
todo:86: warning: Unexpected character `e'
todo:86: warning: Unexpected character `r'
scons: *** [doc/API/html/index.html] Doxygen errors encountered, see doxygen.warn for details
scons: building terminated because of errors.
If I look at doxygen.warn, it just contains the todo:86 lines again, which isn't really details, eh? ;) Anyway, the characters are part of a path in my project, but they don't exist in the sources themselves. I have no idea where the warnings are coming from, and am stuck now, as our team enforces a "no warnings, whatsoever" policy.
Any ideas on where to look for this?

Error when trying to restore backup

This line(518):
COPY wp_commentmeta (meta_id, comment_id, meta_key, meta_value) FROM stdin;
\.
is giving this error:
[ERROR ] 518.0: syntax error, unexpected character
What is this?
I have done backup before with this database, and now I'm just trying to restore all the tables back to the database.

The error:
ERROR: syntax error at or near "\"
LINE 1: ...a (meta_id, comment_id, meta_key, meta_value) FROM stdin; \.
^
********** Error **********
ERROR: syntax error at or near "\"
SQL state: 42601
Character: 77
points to the \ in the \. as being the issue.
Are you sure you require the \.?
Per the documentation:
End of data can be represented by a single line containing just
backslash-period (.). An end-of-data marker is not necessary when
reading from a file, since the end of file serves perfectly well; it
is needed only when copying data to or from client applications using
pre-3.0 client protocol.
Try removing your \. from the line and see if your copy works as expected.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

ora2pg (Unicode surrogate U+DBC0 is illegal in UTF-8) - postgresql

Related

Buildroot mtools "codepage 850 Invalid argument"

unoconv with --stdin not working

Unicode -- Copyright Symbol

Doxygen "todo:86: warning: Unexpected character

Error when trying to restore backup

Categories

Resources