Elixir - Postgres: invalid byte sequence for encoding \"UTF8\ - postgresql

I'm currently working on an elixir project that parses XML from an API and inserts data into postgres using postgrex.
Most inserts work fine, however for the odd insert I get this error. I've seen a lot of other people facing this error, but I'm not to sure how to solve it in Elixir.
23:52:32.402 [error] Process #PID<0.224.0> raised an exception
** (KeyError) key :constraint not found in: %{code: :character_not_in_repertoire, file: "wchar.c", line: "2011", message: "invalid byte sequence for encoding \"UTF8\": 0xe3 0x83 0x22", pg_code: "22021", routine: "report_invalid_encoding", severity: "ERROR"}
(pipeline_processor) lib/worker.ex:133: PipelineProcessor.Worker.recursive_db_insert/1
(pipeline_processor) lib/worker.ex:47: PipelineProcessor.Worker.process_article/1
(pipeline_processor) lib/worker.ex:17: PipelineProcessor.Worker.request_article/0
I'm aware that the error is actually due to accessing an invalid property of the map. However I'm trying to solve the issue that postgrex is giving.
My postgrex insert code:
sql_string = "INSERT INTO articles (title, source, content) VALUES ($1, $2, $3) RETURNING id"
{:ok, pid} = Postgrex.Connection.start_link(Application.get_env(:pipeline_processor, :db_details))
response = Postgrex.Connection.query(
pid,
sql_string,
[article.title, article.source, article.content]
)
Postgrex.Connection.stop(pid)
Is there anyway in Elixir to scrub out invalid bytes so that these inserts can succeed? Or for some way to have postgres handle it?
Thanks

As you already guessed postgres is complaining that you are inserting invalid UTF8 into a text type. I would initially try to fix the bad encodings if you cannot do that you can use a combination of String.codepoints/1 and String.valid_character?/1 to either scrub or escape the invalid bytes.

Related

Why do I get a "unterminated quoted string at or near" error using python postgresql, and not in pgadmin, whith the same request?

I have this query :
INSERT INTO lytnobjects.devices (id,idedge,uniqueref,constructeur,ipaddress,macaddress,
hostname,devicetype,isfirewall,isvisible,iscorporate,
ishub,osname,osversion,datecreation,lasttrafic,
hourtrafic,daytrafic,monthtrafic)
VALUES ('e1e455e98b6ed0037a58d0c1f5dc245a',3183,'TODO','TODO','192.168.143.49',
'b0:0c:d1:bb:36:1c','HPBB361C','Other',False,False,False,False,'','',
'2021-10-29T00:58:53.709','2021-01-01T00:00:00','0/0','0/0','0/0')
When I execute the query using python 3.9 and psycopg2_binary (PostgreSQL), I get an error :
unterminated quoted string at or near "'HPBB361C"
conn is the opened connection to the database (AWS RDS PostgreSQL)
sql is a string with the query above
def SQLExec(conn,sql):
try: cur = conn.cursor()
cur.execute(sql)
except (Exception, psycopg2.DatabaseError) as error:
print("***** ERROR:",error)
cur.close()
If I execute the same request from pgAdmin, I get no error !
There is no missing quote as you can see in the query, and no reason to point an error at this place!
So, I have a string (sql) with the query ("INSERT ...")
I call execute from psycopg2, and get an error: unterminated quoted string at or near "'HPBB361C"
I copy/paste the same string into pgAdmin, and the query is executed with no error
The same string (query)
Any idea why I get an error from my python app?
I am looking for an answer since many hours, but find no explanation, and I don't know how to fix the problem (which doesn't exist for me)
Your help is very appreciated
Thank you
I finaly found the answer!
I build the sql query (string) using some variables coming from various sources, like Amazon S3 for instance.
I assumed that the variable was really a string, with nothing "bizarre" in it... But in fact, sometimes, the "string" was ended with a "\x00" char, that is not displayed, so the string looks just normal :-/
When I execute my query (string) with psycopg2, it receives the extra \x00 char, which ends the string at this place! This is why it says there is a missing quote
I put a trace in the code to display the .encode() version of my string, and it revealed the \x00 at the end. So now I "clean" all string variables used in my queries, just with myvariable.replace("\x00","")
And it works now. There is probably a more conventional way to fix this...
I hope it may help somebody sometime! ;-)

Cannot create user at runtime with Query with parameters

I am using C++ Builder 10.2.3 (Rad Studio Tokyo 10.2.3) with Interbase 2017
I need to create users at runtime for my users registration.
If I create the Query at runtime, in that case there is no parameter, it works. But this creates problems with MBCS characters I will explain later.
If I create the Query at design-time with parameters and try to set the parameters at runtime. I am getting the error message below:
[Application: ]
[Error] -104 335544569 Dynamic SQL Error
SQL error code = -104
Token unknown - line 2, char 14
?
The query I am using is below:
CREATE USER myuser
SET PASSWORD :mypass,
FIRST NAME :myfirstname,
LAST NAME :myname;
I replace the first line of the Query at runtime, so there is no character. And after all, Interbase cannot handle MBCS characters in USERNAME.
I need to use a Query with parameters because my application handles multi-bytes characters (MBCS), like Chinese and Japanese. And this is the only option to be sure of a proper conversion to UTF8 in Interbase. Because if the conversion of MBCS characters is not done, I cannot backup and restore my database. When I try to restore with MBCS characters in First and last name, I am getting an error message that Interbase cannot transliterate between character sets.
Base on the error message, it appears to me that it does not recognize the Query parameters.
I tried with both "TIBQuery" and "TIBSQL". Same issue. Impossible to use also Store procedures. Does not recognize the create word.
So, how to fix that ?

Add a missing key to JSON in a Postgres table via Rails

I'm trying to use update_all to update any records that is missing a key in a JSON stored in a table cell. ids is the ids of those records and I've tried the below...
User.where(id: ids).
update_all(
"preferences = jsonb_set(preferences, '{some_key}', 'true'"
)
Where the error returns is...
Caused by PG::SyntaxError: ERROR: syntax error at or near "WHERE"
LINE 1: ...onb_set(preferences, '{some_key}', 'true' WHERE "user...
The key takes a string value so not sure why the query is failing.
UPDATE:
Based on what was mentioned, I added the parentheses and also added / modified the last two arguments...
User.where(id: ids).
update_all(
"preferences = jsonb_set(preferences, '{some_key}', 'true'::jsonb, true)"
)
still running into issues and this time it seems related to the key I'm passing
I know this key doesn't currently exist for the set of ids
I added true for create_missing so that 1 isn't an issue
I get this error now...
Caused by PG::UndefinedFunction: ERROR: function jsonb_set(hstore, unknown, jsonb, boolean) does not exis
some_key should be a key in preferences
You're passing in raw SQL so you are 100% responsible for ensuring that is actually valid SQL. What you have there isn't. Check your parentheses:
User.where(id: ids).
update_all(
"preferences = jsonb_set(preferences, '{some_key}', 'true')"
)
If you look more closely at the error message it was telling you there was a problem precisely at the introduction of the WHERE clause, and right after ...true' so that was a good place to look for problems.
Syntax errors like this can be really annoying, but don't forget your database will usually do its best to pin down the place where the problem occurs.

pgp_sym_encrypt/pgp_sym_decrypt error handling

I had been using MySQL as database and had planned to move to postgresql. I had used aes_encrypt and aes_decrypt functions in MySQL extensively throughout my application. So whenever the encryption/decrytion fails, MySQL automatically returns 'null'.
I am unsure how to handle the same in postgresql. Tried using the pgp_sym_encrypt/pgp_sym_decrypt functions. If the encryption key is wrong, it throws error "Wrong key/corrupt data". I tried searching for some functions that could capture this error and return 'null' as in MySQL so that I need not modify my code. I had been searching but could not find one.
Has anybody used any error handling mechanism for individual queries? I had found that error handling can be done for procedures. But, I had to completely rewrite the entire application for that.
If you could share some details, it would be of great help. Thanks.
If you wish to avoid modifying your code and have the functions return NULL on error, you can do this by wrapping them in a PL/PgSQL function that uses a BEGIN ... EXCEPTION block to trap the error.
To do this, first I get the SQLSTATE for the error:
regress=# \set VERBOSITY verbose
regress=# SELECT pgp_sym_decrypt('fred','key');
ERROR: 39000: Wrong key or corrupt data
LOCATION: decrypt_internal, pgp-pgsql.c:607
I could use this directly in the error handler, but I prefer to use a symbolic name, so I look up the error name associated with 39000 in Appendix A - Error codes, finding that it's the generic function call error external_routine_invocation_exception. Not as specific as we would've liked, but it'll do.
Now a wrapper function is required. Something like this must be defined, with one function for each overloaded signature of pgp_sym_decrypt that you wish to support. For the (bytea,text) form that returns text, for example:
CREATE OR REPLACE FUNCTION pgp_sym_decrypt_null_on_err(data bytea, psw text) RETURNS text AS $$
BEGIN
RETURN pgp_sym_decrypt(data, psw);
EXCEPTION
WHEN external_routine_invocation_exception THEN
RAISE DEBUG USING
MESSAGE = format('Decryption failed: SQLSTATE %s, Msg: %s',
SQLSTATE,SQLERRM),
HINT = 'pgp_sym_encrypt(...) failed; check your key',
ERRCODE = 'external_routine_invocation_exception';
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
I've chosen to preseve the original error in a DEBUG level message. Here's a comparison of the original and wrapper, with full message verbosity and debug level output.
Enable debug output to show the RAISE. Note that it also shows the *original query text of the pgp_decrypt_sym call, including parameters.
regress=# SET client_min_messages = DEBUG;
New wrapped function still reports the error if detailed logging is enabled, but returns NULL:
regress=# SELECT pgp_sym_decrypt_null_on_err('redsdfsfdsfd','bobsdf');
LOG: 00000: statement: SELECT pgp_sym_decrypt_null_on_err('redsdfsfdsfd','bobsdf');
LOCATION: exec_simple_query, postgres.c:860
DEBUG: 39000: Decryption failed: SQLSTATE 39000, Msg: Wrong key or corrupt data
HINT: pgp_sym_encrypt(...) failed; check your key
LOCATION: exec_stmt_raise, pl_exec.c:2806
pgp_sym_decrypt_null_on_err
-----------------------------
(1 row)
compared to the original, which fails:
regress=# SELECT pgp_sym_decrypt('redsdfsfdsfd','bobsdf');
LOG: 00000: statement: SELECT pgp_sym_decrypt('redsdfsfdsfd','bobsdf');
LOCATION: exec_simple_query, postgres.c:860
ERROR: 39000: Wrong key or corrupt data
LOCATION: decrypt_internal, pgp-pgsql.c:607
Note that both forms show the parameters the function was called with when it failed. The parameters won't be shown if you've used bind parameters ("prepared statements"), but you should still consider your logs to be security critical if you're using in-database encryption.
Personally, I think it's better to do crypto in the app, so the DB never has access to the keys.

PostgreSQL UTF8 Handling

From time time to time my PostgreSQL DB is reporting a strange error:
[client] postgres7 error: [-1: ERROR: invalid byte sequence for encoding \"UTF8\": 0xb4
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by \"client_encoding\".] in adodb_throw(INSERT INTO
page_comments(pageid, pagetype, sender_name, sender_mail, sender_url, comment, owner_uid, owner_gid, sortorder, level, parent)
VALUES(
1493,
102,
\'alexis\',
\'xxx#xxx.es\',
\'\',
\'Next friday i´ll visit Barcelona so in case you need one of this mugs please let me know.\',
1000,
1000,
1,
1,
NULL
), )
Now, I see that it is coming from the funny apostrophe sign. Yet I am totally confused, as the DB was initialized in UTF8, the web application is serving UTF8 pages, and, moreover, the content is being even utf8_encoded before it is pushed into the database.
Does anybody know how to avoid this error?
U+00B4, ACUTE ACCENT, is encoded as '\xb4' in ISO-8859-1. In UTF-8, it would be '\xc2\xb4'. So some part of your application changes the encoding to Latin-1. Find and fix that place, and the error should go away.