What Do PostgreSQL INSERT confirmation params mean? - postgresql

In my LiveCode Server app I'm getting a dberr returned on insertion but no explicit error code.
I went on a terminal and did the insertion by hand as user Postgres.
%My_Dbase=# INSERT INTO new-table (first_name, last_name, anonymous) VALUES ('batman', 'Moonboy', TRUE);
The psql process returns:
INSERT 0 1
What does this line mean? Besides the primary table I also have a sequence to increment the primary key ID (int) of the main table.
If I check the data, the data is inserted, the primary key is increment by one and everything seems fine, I'm not sure why my app is returning an error (could be a bug in the app or my code).
But if I knew what INSERT 0 1 meant, that would help me assure myself that:
Yes, the insertion was done without errors, or
No, the 0 1 indicates an error of some sort.
If anyone has a link to the PostgreSQL doc which tells what these server response params are, I will study it... I have looked everywhere.

Excerpt from the relevant page in the manual:
Outputs
On successful completion, an INSERT command returns a command tag of the form
INSERT oid count
The count is the number of rows inserted. If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. Otherwise oid is zero.

EDIT: I've been pointed at the documentation for this, https://www.postgresql.org/docs/14/protocol-message-formats.html
The format is COMMAND OID ROWS.
As OIDs system columns are not supported anymore, OID is always zero.
So what you are seeing is that you've done an insert, the OID was zero, and one row was inserted.
So in other words, your command is completing successfully!

Related

Range check error when creating GIST index on tsrange value

Fixing range bound data and creating gist tsrange index causes exception. I can guess the PostgreSQL sees old version of records and takes them into account when creating the gist index.
You can reproduce it using this script:
BEGIN;
CREATE TABLE _test_gist_range_index("from" timestamp(0) WITHOUT time zone, "till" timestamp(0) WITHOUT time zone);
--let's enter some invalid data
INSERT INTO _test_gist_range_index VALUES ('2021-01-02', '2021-01-01');
--let's fix the data
DELETE FROM _test_gist_range_index;
CREATE INDEX idx_range_idx_test2 ON _test_gist_range_index USING gist (tsrange("from", "till", '[]'));
COMMIT;
The result is:
SQL Error [22000]: ERROR: range lower bound must be less than or equal to range upper bound
db<>fiddle
I've tested this on all versions of PostgreSQL starting from v9.5 and ending with v13 using db<>fiddle. The result is the same on all of them.
The same error is received if we fix the data using "UPDATE".
Is there a way to fix the data and have range index on it? Maybe there is a way to clean the table somehow from those old values?..
EDIT
It seems that the exception is raised only if data correcting statements (DELETE in my example) and CREATE INDEX statement are in the same transaction. If I DELETE and COMMIT first, and then creating the index succeeds.
That is working as expected and not a bug.
When you delete a row in a PostgreSQL table, it is not actually deleted, but marked as invisible. Similarly, updating a row creates a new version of the row, but the old version is retained and marked invisible. This is the way how PostgreSQL implements multiversioning: concurrent transactions can still see the "invisible" data. Eventually, invisible rows are reclaimed by VACUUM.
Now a B-tree or GiST index contains one entry for each row version in the table, unless the row version is not visible by anybody (is dead). This explains why a deleted row will still cause an error if the data don't form a valid range.
If you run the statements in autocommit mode on an otherwise idle database, the deleted rows are dead, and no index entry has to be created.

serial in postgres is being increased even though I added on conflict do nothing

I'm using Postgres 9.5 and seeing some wired things here.
I've a cron job running ever 5 mins firing a sql statement that is adding a list of records if not existing.
INSERT INTO
sometable (customer, balance)
VALUES
(:customer, :balance)
ON CONFLICT (customer) DO NOTHING
sometable.customer is a primary key (text)
sometable structure is:
id: serial
customer: text
balance: bigint
Now it seems like everytime this job runs, the id field is silently incremented +1. So next time, I really add a field, it is thousands of numbers above my last value. I thought this query checks for conflicts and if so, do nothing but currently it seems like it tries to insert the record, increased the id and then stops.
Any suggestions?
The reason this feels weird to you is that you are thinking of the increment on the counter as part of the insert operation, and therefore the "DO NOTHING" ought to mean "don't increment anything". You're picturing this:
Check values to insert against constraint
If duplicate detected, abort
Increment sequence
Insert data
But in fact, the increment has to happen before the insert is attempted. A SERIAL column in Postgres is implemented as a DEFAULT which executes the nextval() function on a bound SEQUENCE. Before the DBMS can do anything with the data, it's got to have a complete set of columns, so the order of operations is like this:
Resolve default values, including incrementing the sequence
Check values to insert against constraint
If duplicate detected, abort
Insert data
This can be seen intuitively if the duplicate key is in the autoincrement field itself:
CREATE TABLE foo ( id SERIAL NOT NULL PRIMARY KEY, bar text );
-- Insert row 1
INSERT INTO foo ( bar ) VALUES ( 'test' );
-- Reset the sequence
SELECT setval(pg_get_serial_sequence('foo', 'id'), 0, true);
-- Attempt to insert row 1 again
INSERT INTO foo ( bar ) VALUES ( 'test 2' )
ON CONFLICT (id) DO NOTHING;
Clearly, this can't know if there's a conflict without incrementing the sequence, so the "do nothing" has to come after that increment.
As already said by #a_horse_with_no_name and #Serge Ballesta serials are always incremented even if INSERT fails.
You can try to "rollback" serial value to maximum id used by changing the corresponding sequence:
SELECT setval('sometable_id_seq', MAX(id), true) FROM sometable;
As said by #a_horse_with_no_name, that is by design. Serial type fields are implemented under the hood through sequences, and for evident reasons, once you have gotten a new value from a sequence, you cannot rollback the last value. Imagine the following scenario:
sequence is at n
A requires a new value : got n+1
in a concurrent transaction B requires a new value: got n+2
for any reason A rollbacks its transaction - would you feel safe to reset sequence?
That is the reason why sequences (and serial field) just document that in case of rollbacked transactions holes can occur in the returned values. Only unicity is guaranteed.
Well there is technique that allows you to do stuff like that. They call insert mutex. It is old old old, but it works.
https://www.percona.com/blog/2011/11/29/avoiding-auto-increment-holes-on-innodb-with-insert-ignore/
Generally idea is that you do INSERT SELECT and if your values are duplicating the SELECT does not return any results that of course prevents INSERT and the index is not incremented. Bit of mind boggling, but perfectly valid and performant.
This of course completely ignores ON DUPLICATE but one gets back control over the index.

Postgres 9.3: Sharelock issue with simple INSERT

Update: Potential solution below
I have a large corpus of configuration files consisting of key/value pairs that I'm trying to push into a database. A lot of the keys and values are repeated across configuration files so I'm storing the data using 3 tables. One for all unique key values, one for all unique pair values, and one listing all the key/value pairs for each file.
Problem:
I'm using multiple concurrent processes (and therefore connections) to add the raw data into the database. Unfortunately I get a lot of detected deadlocks when trying to add values to the key and value tables. I have a tried a few different methods of inserting the data (shown below), but always end up with a "deadlock detected" error
TransactionRollbackError: deadlock detected DETAIL: Process 26755
waits for ShareLock on transaction 689456; blocked by process 26754.
Process 26754 waits for ShareLock on transaction 689467; blocked by
process 26755.
I was wondering if someone could shed some light on exactly what could be causing these deadlocks, and possibly point me towards some way of fixing the issue. Looking at the SQL statements I'm using (listed below), I don't really see why there is any co-dependency at all. Thanks for reading!
Example config file:
example_key this_is_the_value
other_example other_value
third example yet_another_value
Table definitions:
CREATE TABLE keys (
id SERIAL PRIMARY KEY,
hash UUID UNIQUE NOT NULL,
key TEXT);
CREATE TABLE values (
id SERIAL PRIMARY KEY,
hash UUID UNIQUE NOT NULL,
key TEXT);
CREATE TABLE keyvalue_pairs (
id SERIAL PRIMARY KEY,
file_id INTEGER REFERENCES filenames,
key_id INTEGER REFERENCES keys,
value_id INTEGER REFERENCES values);
SQL Statements:
Initially I was trying to use this statement to avoid any exceptions:
WITH s AS (
SELECT id, hash, key FROM keys
WHERE hash = 'hash_value';
), i AS (
INSERT INTO keys (hash, key)
SELECT 'hash_value', 'key_value'
WHERE NOT EXISTS (SELECT 1 FROM s)
returning id, hash, key
)
SELECT id, hash, key FROM i
UNION ALL
SELECT id, hash, key FROM s;
But even something as simple as this causes the deadlocks:
INSERT INTO keys (hash, key)
VALUES ('hash_value', 'key_value')
RETURNING id;
In both cases, if I get an exception thrown because the inserted hash
value is not unique, I use savepoints to rollback the change and
another statement to just select the id I'm after.
I'm using hashes for the unique field, as some of the keys and values
are too long to be indexed
Full example of the python code (using psycopg2) with savepoints:
key_value = 'this_key'
hash_val = generate_uuid(value)
try:
cursor.execute(
'''
SAVEPOINT duplicate_hash_savepoint;
INSERT INTO keys (hash, key)
VALUES (%s, %s)
RETURNING id;
'''
(hash_val, key_value)
)
result = cursor.fetchone()[0]
cursor.execute('''RELEASE SAVEPOINT duplicate_hash_savepoint''')
return result
except psycopg2.IntegrityError as e:
cursor.execute(
'''
ROLLBACK TO SAVEPOINT duplicate_hash_savepoint;
'''
)
#TODO: Should ensure that values match and this isn't just
#a hash collision
cursor.execute(
'''
SELECT id FROM keys WHERE hash=%s LIMIT 1;
'''
(hash_val,)
)
return cursor.fetchone()[0]
Update:
So I believe I a hint on another stackexchange site:
Specifically:
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time1. However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress). If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it2, otherwise it
will attempt to apply its operation to the updated version of the row.
While I'm still not exactly sure where the co-dependency is, it seems that processing a large number of key/value pairs without commiting would likely result in something like this. Sure enough, if I commit after each individual configuration file is added, the deadlocks don't occur.
It looks like you're in this situation:
The table to INSERT into has a primary key (or unique index(es) of any sort).
Several INSERTs into that table are performed within one transaction (as opposed to committing immediately after each one)
The rows to insert come in random order (with regard to the primary key)
The rows are inserted in concurrent transactions.
This situation creates the following opportunity for deadlock:
Assuming there are two sessions, that each started a transaction.
Session #1: insert row with PK 'A'
Session #2: insert row with PK 'B'
Session #1: try to insert row with PK 'B'
=> Session #1 is put to wait until Session #2 commits or rollbacks
Session #2: try to insert row with PK 'A'
=> Session #2 is put to wait for Session #1.
Shortly thereafter, the deadlock detector gets aware that both sessions are now waiting for each other, and terminates one of them with a fatal deadlock detected error.
If you're in this scenario, the simplest solution is to COMMIT after a new entry is inserted, before attempting to insert any new row into the table.
Postgres is known for that type of deadlocks, to be honest. I often encounter such problems when different workers update information about interleaving entities. Recently I had a task of importing a big list of scientific papers metadata from multiple json files. I was using parallel processes via joblib to read from several files at the same time. Deadlocks were hanging all the time on authors(id bigint primary key, name text) table all the time 'cause many files contained papers of the same authors, therefore producing inserts with oftentimes the same authors. I was using insert into authors (id,name) values %s on conflict(id) do nothing, but that was not helping. I tried sorting tuples before sending them to Postgres server, with little success. What really helped me was keeping a list of known authors in a Redis set (accessible to all processes):
if not rexecute("sismember", "known_authors", author_id):
# your logic...
rexecute("sadd", "known_authors", author_id)
Which I recommend to everyone. Use Memurai if you are limited to Windows. Sad but true, not a lot of other options for Postgres.

Duplicate Key error when using INSERT DEFAULT

I am getting a duplicate key error, DB2 SQL Error: SQLCODE=-803, SQLSTATE=23505, when I try to INSERT records. The primary key is one column, INTEGER 4, Generated, and it is the first column.
the insert looks like this: INSERT INTO SCHEMA.TABLE1 values (DEFAULT, ?, ?, ...)
It's my understanding that using the value DEFAULT will just let DB2 auto-generate the key at the time of insert, which is what I want. This works most of the time, but sometimes/randomly I get the duplicate key error. Thoughts?
More specifically, I'm running against DB2 9.7.0.3, using Scriptella to copy a bunch of records from one database to another. Sometimes I can process a bunch with no problems, other times I'll get the error right away, other times after 2 records, or 20 records, or 30 records, etc. Does not seem to be a pattern, nor is it the same record every time. If I change the data to copy 1 record instead of a bunch, sometimes I'll get the error one time, then it's fine the next time.
I thought maybe some other process was inserting records during my batch program, and creating keys at the same time. However, the tables I'm copying TO should not have any other users/processes trying to INSERT records during this same time frame, although there could be READS happening.
Edit: adding create info:
Create table SCHEMA.TABLE1 (
SYSTEM_USER_KEY INTEGER NOT NULL
generated by default as identity (start with 1 increment by 1 cache 20),
COL2...,
)
alter table SCHEMA.TABLE1
add constraint SYSTEM_USER_SYSTEM_USER_KEY_IDX
Primary Key (SYSTEM_USER_KEY);
You most likely have records in your table with IDs that are bigger then the next value in your identity sequence. To find out what the current value your sequence is about at, run the following query.
select s.nextcachefirstvalue-s.cache, s.nextcachefirstvalue-s.increment
from syscat.COLIDENTATTRIBUTES as a inner join syscat.sequences as s on a.seqid=s.seqid
where a.tabschema='SCHEMA'
and a.TABNAME='TABLE1'
and a.COLNAME='SYSTEM_USER_KEY'
So basically what happened is that somehow you got records in your table with ids that are bigger then the current last value of your identity sequence. So sooner or later these ids will collide with identity generated ids.
There are different reasons on how this could have happened. One possibility is that data was loaded which already contained values for the id column or that records were inserted with an actual value for the ID. Another option is that the identity sequence was reset to start at a lower value than the max id in the table.
Whatever the cause, you may also want the fix:
SELECT MAX(<primary_key_column>) FROM onsite.forms;
ALTER TABLE <table> ALTER COLUMN <primary_key_column> RESTART WITH <number from previous query + 1>;

Values missing in postgres serial field

I run a small site and use PostgreSQL 8.2.17 (only version available at my host) to store data.
In the last few months there were 3 crashes of the database system on my server and every time it happened 31 ID's from a serial field (primary key) in one of the tables were missing. There are now 93 ID's missing.
Table:
CREATE TABLE "REGISTRY"
(
"ID" serial NOT NULL,
"strUID" character varying(11),
"strXml" text,
"intStatus" integer,
"strUIDOrg" character varying(11),
)
It is very important for me that all the ID values are there. What can I do to to solve this problem?
You can not expect serial column to not have holes.
You can implement gapless key by sacrificing concurrency like this:
create table registry_last_id (value int not null);
insert into registry_last_id values (-1);
create function next_registry_id() returns int language sql volatile
as $$
update registry_last_id set value=value+1 returning value
$$;
create table registry ( id int primary key default next_registry_id(), ... )
But any transaction, which tries to insert something to registry table will block until other insert transaction finishes and writes its data to disk. This will limit you to no more than 125 inserting transactions per second on 7500rpm disk drive.
Also any delete from registry table will create a gap.
This solution is based on article Gapless Sequences for Primary Keys by A. Elein Mustain, which is somewhat outdated.
Are you missing 93 records or do you have 3 "holes" of 31 missing numbers?
A sequence is not transaction safe, it will never rollback. Therefor it is not a system to create a sequence of numbers without holes.
From the manual:
Important: To avoid blocking
concurrent transactions that obtain
numbers from the same sequence, a
nextval operation is never rolled
back; that is, once a value has been
fetched it is considered used, even if
the transaction that did the nextval
later aborts. This means that aborted
transactions might leave unused
"holes" in the sequence of assigned
values. setval operations are never
rolled back, either.
Thanks to the answers from Matthew Wood and Frank Heikens i think i have a solution.
Instead of using serial field I have to create my own sequence and define CACHE parameter to 1. This way postgres will not cache values and each one will be taken directly from the sequence :)
Thanks for all your help :)