PostgreSQL UTF8 Handling - postgresql

From time time to time my PostgreSQL DB is reporting a strange error:
[client] postgres7 error: [-1: ERROR: invalid byte sequence for encoding \"UTF8\": 0xb4
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by \"client_encoding\".] in adodb_throw(INSERT INTO
page_comments(pageid, pagetype, sender_name, sender_mail, sender_url, comment, owner_uid, owner_gid, sortorder, level, parent)
VALUES(
1493,
102,
\'alexis\',
\'xxx#xxx.es\',
\'\',
\'Next friday i´ll visit Barcelona so in case you need one of this mugs please let me know.\',
1000,
1000,
1,
1,
NULL
), )
Now, I see that it is coming from the funny apostrophe sign. Yet I am totally confused, as the DB was initialized in UTF8, the web application is serving UTF8 pages, and, moreover, the content is being even utf8_encoded before it is pushed into the database.
Does anybody know how to avoid this error?

U+00B4, ACUTE ACCENT, is encoded as '\xb4' in ISO-8859-1. In UTF-8, it would be '\xc2\xb4'. So some part of your application changes the encoding to Latin-1. Find and fix that place, and the error should go away.

Related

Cannot create user at runtime with Query with parameters

I am using C++ Builder 10.2.3 (Rad Studio Tokyo 10.2.3) with Interbase 2017
I need to create users at runtime for my users registration.
If I create the Query at runtime, in that case there is no parameter, it works. But this creates problems with MBCS characters I will explain later.
If I create the Query at design-time with parameters and try to set the parameters at runtime. I am getting the error message below:
[Application: ]
[Error] -104 335544569 Dynamic SQL Error
SQL error code = -104
Token unknown - line 2, char 14
?
The query I am using is below:
CREATE USER myuser
SET PASSWORD :mypass,
FIRST NAME :myfirstname,
LAST NAME :myname;
I replace the first line of the Query at runtime, so there is no character. And after all, Interbase cannot handle MBCS characters in USERNAME.
I need to use a Query with parameters because my application handles multi-bytes characters (MBCS), like Chinese and Japanese. And this is the only option to be sure of a proper conversion to UTF8 in Interbase. Because if the conversion of MBCS characters is not done, I cannot backup and restore my database. When I try to restore with MBCS characters in First and last name, I am getting an error message that Interbase cannot transliterate between character sets.
Base on the error message, it appears to me that it does not recognize the Query parameters.
I tried with both "TIBQuery" and "TIBSQL". Same issue. Impossible to use also Store procedures. Does not recognize the create word.
So, how to fix that ?

Mailkit SearchQuery IMAP BAD Command Argument Error. 11

Is there anything wrong with this query?
zSQry = SearchQuery.Seen.And(SearchQuery.SubjectContains("spain").And(SearchQuery.DeliveredAfter(New Date(2017, 3, 11))))
I´m getting "BAD" from server
S: A00000005 OK [READ-ONLY] EXAMINE completed.
C: A00000006 UID SEARCH CHARSET US-ASCII SEEN SUBJECT spain SINCE 11-MAR-2017
S: A00000006 BAD Command Argument Error. 11
C: A00000007 LOGOUT
S: * BYE Microsoft Exchange Server 2016 IMAP4 server signing off.
Or this kind of search is not valid.
Oops my bad, a little mistake in my searchquery construct, but an interesting aspect to note, because there was no compilation error in VS2005.
This is the initial search query, nothing wrong with sintax at IDE, but produces an error at server
SearchQuery.Seen.And(SearchQuery.SubjectContains("paulistana").And(SearchQuery.DeliveredAfter(New Date(2017, 3, 11))))
Now with a small change (parentesis in bold), it works perfect
SearchQuery.Seen.And(SearchQuery.SubjectContains("paulistana")).And(SearchQuery.DeliveredAfter(New Date(2017, 3, 11)))
First produces (double space between SEEN and SUBJECT)
C: A00000006 UID SEARCH CHARSET US-ASCII SEEN SUBJECT spain SINCE 11-MAR-2017
Second produces (single space)
C: A00000006 UID SEARCH CHARSET US-ASCII SEEN SUBJECT spain SINCE 11-MAR-2017
Ooops, hold on, it´s been a long time I'm using mailkit. Since 2015.
That´s why you can´t reproduce with current version.
I´m on 1.2.12.0, so have to reconsider and upgrade.
I´ll check with current version.
No need to open an issue, thanks to refresh me.

SAP Unicode: Offset exceed

I got some account issues in the SCN so I make a attempt here.
We switched to Unicode and got some issues with that. INFTY_TAB = PS+2. This coding gets an error that "the offset + length is exceeding".
I found some hints but couldn't really figure out how to fix this. And even when I manage to fix those errors I got a new error called 'Iclude-Report %HR_P9002 not found'. The IT is still there so is there something else I can check?
Definition of PS:
DATA: BEGIN OF PS OCCURS 0.
*This indicates if a record was read with disabled authority check.
data: authc_disabled(1) type c.
DATA: TCLAS LIKE PSPAR-TCLAS.
INCLUDE STRUCTURE PRELP.
DATA: ACRCD LIKE SY-SUBRC.
DATA: END OF PS.
TCLAS is a char(1) field.
This is the part where the error pops up:
INFTY_TAB = PS+2.
Error: I had to translate so sorry for some mistakes that could appear.
Offset and Length (=2432) exceed the length of the character based beginning (=2430) of the structure.
Depends on the length of INFTY_TAB. You have to explicitly set length:
INFTY_TAB = PS+2(length).
Official information is here. The important point to note is that the inclusion of SY-SUBRC (which is an INT4 field) places a limit to the range of fields you can access using this (discouraged) method of access.
ASSIGN field+off TO is generally forbidden from a syntactical
point of view since any offset <> 0 would cause the range to be
exceeded.
Although the sentence above is related to ASSIGN command, it is also valid for this situation.

Loading error of tdbloader2: Illegal character in IRI

I'm trying to replicate a DBpedia for an experiment.
I download the latest dataset of DBpedia from: http://downloads.dbpedia.org/2015-10/core/
and store them a directory dbp_201510/.
I tried to load the dataset using tdbloader2.
tdbloader2 --loc tdb dbp_201510/*
However, I receive the following error.
ERROR [line: 2, col: 145] Illegal character in IRI (codepoint 0x60, '`'): <http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/[`]...>
org.apache.jena.riot.RiotException: [line: 2, col: 145] Illegal character in IRI (codepoint 0x60, '`'): <http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/[`]...> at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:165)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:108)
at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:71)
at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:58)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:176)
at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:861)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:667)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:637)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:626)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:617)
at org.apache.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:165)
at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
at org.apache.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:85)
In addition, I receive a lot of warnings as below.
WARN [line: 92881, col: 1 ] Bad IRI: <http://dbpedia.org/resource/Ranma_½> Code: 56/COMPATIBILITY_CHARACTER in PATH: TODO
WARN [line: 92882, col: 1 ] Bad IRI: <http://dbpedia.org/resource/Ranma_½> Code: 47/NOT_NFKC in PATH: The IRI is not in Unicode Normal Form KC.
I use Apache Jena 3.0.1.
I'm looking for a way to avoid this error.
In addition, is there a good way to load without warning.
I did same thing for the former version of DBpedia (http://downloads.dbpedia.org/2015-04/core/) and loading was successfully completed without any warning and error.
The data should be make legal before loading. The 0x60, '`' is not legal in a URI. Maybe you want to replace it with %60 (it is then a different URI).
In many large datasets, data isn't perfect. It is worth checking it before loading using "riot --validate".
The warnings are just warning, not errors, and indicate that teh UTF-8 is not in the standards preferred form and might cause matching problems later. It looks like ½ can be written in different ways in UTF-8.
(I'm sure the DBpedia team would appreciate some feedback.)

Elixir - Postgres: invalid byte sequence for encoding \"UTF8\

I'm currently working on an elixir project that parses XML from an API and inserts data into postgres using postgrex.
Most inserts work fine, however for the odd insert I get this error. I've seen a lot of other people facing this error, but I'm not to sure how to solve it in Elixir.
23:52:32.402 [error] Process #PID<0.224.0> raised an exception
** (KeyError) key :constraint not found in: %{code: :character_not_in_repertoire, file: "wchar.c", line: "2011", message: "invalid byte sequence for encoding \"UTF8\": 0xe3 0x83 0x22", pg_code: "22021", routine: "report_invalid_encoding", severity: "ERROR"}
(pipeline_processor) lib/worker.ex:133: PipelineProcessor.Worker.recursive_db_insert/1
(pipeline_processor) lib/worker.ex:47: PipelineProcessor.Worker.process_article/1
(pipeline_processor) lib/worker.ex:17: PipelineProcessor.Worker.request_article/0
I'm aware that the error is actually due to accessing an invalid property of the map. However I'm trying to solve the issue that postgrex is giving.
My postgrex insert code:
sql_string = "INSERT INTO articles (title, source, content) VALUES ($1, $2, $3) RETURNING id"
{:ok, pid} = Postgrex.Connection.start_link(Application.get_env(:pipeline_processor, :db_details))
response = Postgrex.Connection.query(
pid,
sql_string,
[article.title, article.source, article.content]
)
Postgrex.Connection.stop(pid)
Is there anyway in Elixir to scrub out invalid bytes so that these inserts can succeed? Or for some way to have postgres handle it?
Thanks
As you already guessed postgres is complaining that you are inserting invalid UTF8 into a text type. I would initially try to fix the bad encodings if you cannot do that you can use a combination of String.codepoints/1 and String.valid_character?/1 to either scrub or escape the invalid bytes.