Duplication issue with bpel service in cluster environment - db2

I have a bpel process which invokes a db2 stored procedure through java to get a unique sequence number and then insert that number into one of the db2 table. This process is running in cluster environment, so sometimes it happens that two instance of the process gets the same unique sequence number and then it tries to insert duplicate values into the table resulting in fault. Thanks.

As #mustaccio says, if you're using a sequence it's not possible to get the same value twice, regardless of how many clients are requesting values from the sequence at the same time.
That said, there's nothing that prevents you from inserting a row into a table with a value that the sequence hasn't generated yet; Then, when the sequence returns the value in question and you try to insert it, it will fail:
create table test (id int not null primary key);
create sequence s1 start with 10;
-- This will insert the value 10, which is OK.
insert into test values (nextval for s1);
-- This will insert the value 11, which is OK.
insert into test values (nextval for s1);
-- Insert a row without using sequence (BAD!)
insert into test values (13);
-- This will insert the value 12, which is OK.
insert into test values (nextval for s1);
-- This will attempt to the value 13, which will fail because it is a "duplicate",
-- but not because the sequence generated a duplicate.
insert into test values (nextval for s1);
The solution to this problem is to:
Ensure that processes don't insert values without using the sequence
If #1 is not possible, make sure that you use alter sequence to restart the sequence such that it won't generate a conflict. (in the example above, alter sequence s1 restart with 14 after performing the naughty insert statement.

As you said, the Bpel is running on clustered environmnent..
If you are using Bpel Polling, please make sure that you marked 'Distributed Polling' flag.
Distributed Polling means that when a record is read, it is locked by the reading instance. Another instance which wants to pickup the record skips locked records.

Related

psql upsert results in noncontinuous id

I have a postgresql (>9.5) table with primary_key id and a unique key col. When I use
INSERT INTO table_a (col) VLUES('xxx') ON CONFLICT(col) DO NOTHING;
to perform a upsert, let's say a row with an id 1 is generated.
If I run the sql again, nothing will happen, but actually the id 2 will be generated and abandoned.
Then if I insert a new record, for example,
INSERT INTO table_a (col) VLUES('yyy') ON CONFLICT(col) DO NOTHING;
Another row with id 3 will be generated and id 2 is wasted!
Is there anyway to avoid this waste?
Presumably id is a serial. Under the hood this causes a nextval() call from a sequence. A number nextval() once returned will never be returned again. And the call of nextval() happens before checking for possible conflicts.
From "9.16. Sequence Manipulation Functions":
nextval
(...)
Important: To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used and will not be returned again. This is true even if the surrounding transaction later aborts, or if the calling query ends up not using the value. For example an INSERT with an ON CONFLICT clause will compute the to-be-inserted tuple, including doing any required nextval calls, before detecting any conflict that would cause it to follow the ON CONFLICT rule instead. Such cases will leave unused "holes" in the sequence of assigned values. Thus, PostgreSQL sequence objects cannot be used to obtain "gapless" sequences.
Concluded that means, that the answer to your question is no, there is no way to avoid this unless you generate the values yourself somehow.

serial in postgres is being increased even though I added on conflict do nothing

I'm using Postgres 9.5 and seeing some wired things here.
I've a cron job running ever 5 mins firing a sql statement that is adding a list of records if not existing.
INSERT INTO
sometable (customer, balance)
VALUES
(:customer, :balance)
ON CONFLICT (customer) DO NOTHING
sometable.customer is a primary key (text)
sometable structure is:
id: serial
customer: text
balance: bigint
Now it seems like everytime this job runs, the id field is silently incremented +1. So next time, I really add a field, it is thousands of numbers above my last value. I thought this query checks for conflicts and if so, do nothing but currently it seems like it tries to insert the record, increased the id and then stops.
Any suggestions?
The reason this feels weird to you is that you are thinking of the increment on the counter as part of the insert operation, and therefore the "DO NOTHING" ought to mean "don't increment anything". You're picturing this:
Check values to insert against constraint
If duplicate detected, abort
Increment sequence
Insert data
But in fact, the increment has to happen before the insert is attempted. A SERIAL column in Postgres is implemented as a DEFAULT which executes the nextval() function on a bound SEQUENCE. Before the DBMS can do anything with the data, it's got to have a complete set of columns, so the order of operations is like this:
Resolve default values, including incrementing the sequence
Check values to insert against constraint
If duplicate detected, abort
Insert data
This can be seen intuitively if the duplicate key is in the autoincrement field itself:
CREATE TABLE foo ( id SERIAL NOT NULL PRIMARY KEY, bar text );
-- Insert row 1
INSERT INTO foo ( bar ) VALUES ( 'test' );
-- Reset the sequence
SELECT setval(pg_get_serial_sequence('foo', 'id'), 0, true);
-- Attempt to insert row 1 again
INSERT INTO foo ( bar ) VALUES ( 'test 2' )
ON CONFLICT (id) DO NOTHING;
Clearly, this can't know if there's a conflict without incrementing the sequence, so the "do nothing" has to come after that increment.
As already said by #a_horse_with_no_name and #Serge Ballesta serials are always incremented even if INSERT fails.
You can try to "rollback" serial value to maximum id used by changing the corresponding sequence:
SELECT setval('sometable_id_seq', MAX(id), true) FROM sometable;
As said by #a_horse_with_no_name, that is by design. Serial type fields are implemented under the hood through sequences, and for evident reasons, once you have gotten a new value from a sequence, you cannot rollback the last value. Imagine the following scenario:
sequence is at n
A requires a new value : got n+1
in a concurrent transaction B requires a new value: got n+2
for any reason A rollbacks its transaction - would you feel safe to reset sequence?
That is the reason why sequences (and serial field) just document that in case of rollbacked transactions holes can occur in the returned values. Only unicity is guaranteed.
Well there is technique that allows you to do stuff like that. They call insert mutex. It is old old old, but it works.
https://www.percona.com/blog/2011/11/29/avoiding-auto-increment-holes-on-innodb-with-insert-ignore/
Generally idea is that you do INSERT SELECT and if your values are duplicating the SELECT does not return any results that of course prevents INSERT and the index is not incremented. Bit of mind boggling, but perfectly valid and performant.
This of course completely ignores ON DUPLICATE but one gets back control over the index.

Postgres 9.3: Sharelock issue with simple INSERT

Update: Potential solution below
I have a large corpus of configuration files consisting of key/value pairs that I'm trying to push into a database. A lot of the keys and values are repeated across configuration files so I'm storing the data using 3 tables. One for all unique key values, one for all unique pair values, and one listing all the key/value pairs for each file.
Problem:
I'm using multiple concurrent processes (and therefore connections) to add the raw data into the database. Unfortunately I get a lot of detected deadlocks when trying to add values to the key and value tables. I have a tried a few different methods of inserting the data (shown below), but always end up with a "deadlock detected" error
TransactionRollbackError: deadlock detected DETAIL: Process 26755
waits for ShareLock on transaction 689456; blocked by process 26754.
Process 26754 waits for ShareLock on transaction 689467; blocked by
process 26755.
I was wondering if someone could shed some light on exactly what could be causing these deadlocks, and possibly point me towards some way of fixing the issue. Looking at the SQL statements I'm using (listed below), I don't really see why there is any co-dependency at all. Thanks for reading!
Example config file:
example_key this_is_the_value
other_example other_value
third example yet_another_value
Table definitions:
CREATE TABLE keys (
id SERIAL PRIMARY KEY,
hash UUID UNIQUE NOT NULL,
key TEXT);
CREATE TABLE values (
id SERIAL PRIMARY KEY,
hash UUID UNIQUE NOT NULL,
key TEXT);
CREATE TABLE keyvalue_pairs (
id SERIAL PRIMARY KEY,
file_id INTEGER REFERENCES filenames,
key_id INTEGER REFERENCES keys,
value_id INTEGER REFERENCES values);
SQL Statements:
Initially I was trying to use this statement to avoid any exceptions:
WITH s AS (
SELECT id, hash, key FROM keys
WHERE hash = 'hash_value';
), i AS (
INSERT INTO keys (hash, key)
SELECT 'hash_value', 'key_value'
WHERE NOT EXISTS (SELECT 1 FROM s)
returning id, hash, key
)
SELECT id, hash, key FROM i
UNION ALL
SELECT id, hash, key FROM s;
But even something as simple as this causes the deadlocks:
INSERT INTO keys (hash, key)
VALUES ('hash_value', 'key_value')
RETURNING id;
In both cases, if I get an exception thrown because the inserted hash
value is not unique, I use savepoints to rollback the change and
another statement to just select the id I'm after.
I'm using hashes for the unique field, as some of the keys and values
are too long to be indexed
Full example of the python code (using psycopg2) with savepoints:
key_value = 'this_key'
hash_val = generate_uuid(value)
try:
cursor.execute(
'''
SAVEPOINT duplicate_hash_savepoint;
INSERT INTO keys (hash, key)
VALUES (%s, %s)
RETURNING id;
'''
(hash_val, key_value)
)
result = cursor.fetchone()[0]
cursor.execute('''RELEASE SAVEPOINT duplicate_hash_savepoint''')
return result
except psycopg2.IntegrityError as e:
cursor.execute(
'''
ROLLBACK TO SAVEPOINT duplicate_hash_savepoint;
'''
)
#TODO: Should ensure that values match and this isn't just
#a hash collision
cursor.execute(
'''
SELECT id FROM keys WHERE hash=%s LIMIT 1;
'''
(hash_val,)
)
return cursor.fetchone()[0]
Update:
So I believe I a hint on another stackexchange site:
Specifically:
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time1. However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress). If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it2, otherwise it
will attempt to apply its operation to the updated version of the row.
While I'm still not exactly sure where the co-dependency is, it seems that processing a large number of key/value pairs without commiting would likely result in something like this. Sure enough, if I commit after each individual configuration file is added, the deadlocks don't occur.
It looks like you're in this situation:
The table to INSERT into has a primary key (or unique index(es) of any sort).
Several INSERTs into that table are performed within one transaction (as opposed to committing immediately after each one)
The rows to insert come in random order (with regard to the primary key)
The rows are inserted in concurrent transactions.
This situation creates the following opportunity for deadlock:
Assuming there are two sessions, that each started a transaction.
Session #1: insert row with PK 'A'
Session #2: insert row with PK 'B'
Session #1: try to insert row with PK 'B'
=> Session #1 is put to wait until Session #2 commits or rollbacks
Session #2: try to insert row with PK 'A'
=> Session #2 is put to wait for Session #1.
Shortly thereafter, the deadlock detector gets aware that both sessions are now waiting for each other, and terminates one of them with a fatal deadlock detected error.
If you're in this scenario, the simplest solution is to COMMIT after a new entry is inserted, before attempting to insert any new row into the table.
Postgres is known for that type of deadlocks, to be honest. I often encounter such problems when different workers update information about interleaving entities. Recently I had a task of importing a big list of scientific papers metadata from multiple json files. I was using parallel processes via joblib to read from several files at the same time. Deadlocks were hanging all the time on authors(id bigint primary key, name text) table all the time 'cause many files contained papers of the same authors, therefore producing inserts with oftentimes the same authors. I was using insert into authors (id,name) values %s on conflict(id) do nothing, but that was not helping. I tried sorting tuples before sending them to Postgres server, with little success. What really helped me was keeping a list of known authors in a Redis set (accessible to all processes):
if not rexecute("sismember", "known_authors", author_id):
# your logic...
rexecute("sadd", "known_authors", author_id)
Which I recommend to everyone. Use Memurai if you are limited to Windows. Sad but true, not a lot of other options for Postgres.

PostgreSQL: trivial INSERT fails the first time, succeeds afterwards

I am puzzled by a weird Postgres problem I encounter in the trivial database shown below: If I first insert a tag and explicitly specify its ID and then try to insert another tag without passing an ID, then this second insert fails. If I try a third time (again without ID), the insert succeeds.
DROP DATABASE IF EXISTS mydb;
CREATE DATABASE mydb;
\c mydb
DROP SCHEMA public;
CREATE SCHEMA core;
CREATE TABLE core.tag
(
id serial PRIMARY KEY,
title text NOT NULL
);
-- this works: all columns specified explicitly
INSERT INTO core.tag(id, title) VALUES (1, 'known tag');
-- omitting the tag ID fails with
-- ERROR: duplicate key value violates unique constraint "tag_pkey"
-- DETAIL: Key (id)=(1) already exists.
INSERT INTO core.tag(title) VALUES ('unknown tag');
-- this works again ?!?
INSERT INTO core.tag(title) VALUES ('unknown tag');
The issue only seems to occur on a freshly created database and once it does, it does not seem to happen again. I have never come across anything like this - so far, I have just inserted data with or without explicit ID and AFAICS, nothing ever failed like this...
Does anyone have an idea what's going on here ?!?
Environment: PostgreSQL 9.1.3 on Mac OSX 10.7.5
Of course this fails.
What happens?
When you create the table, a sequence is also created that generates the values for the ID column. The sequence starts with 1 but it is only used if you do not specify a value for the ID column.
Now when you run
INSERT INTO core.tag(id, title) VALUES (1, 'known tag');
you bypass Postgres' automatic assigment of the ID value, the sequence "stays" at one.
Now when you run
INSERT INTO core.tag(title) VALUES ('unknown tag');
Postgres takes the next value from the sequence - which is 1. But that alreay exists so the insert fails. After taking the value from the sequence, the next value is 2, so the subsequent insert without specifying an ID value gets the 2 and succeeds.
The solution is to either never include the ID column in your inserts. Or - if you do - request the ID from the sequence:
INSERT INTO core.tag(id, title) VALUES (nextval('tag_id_seq'), 'known tag');
When a serial column is created it is automatically associated with a sequence which is named <table_name>_<column_name>_seq. And that's the name I used in the above statement.
More details about how the serial "data type" works are in the manual: http://www.postgresql.org/docs/current/static/datatype-numeric.html#DATATYPE-SERIAL

Values missing in postgres serial field

I run a small site and use PostgreSQL 8.2.17 (only version available at my host) to store data.
In the last few months there were 3 crashes of the database system on my server and every time it happened 31 ID's from a serial field (primary key) in one of the tables were missing. There are now 93 ID's missing.
Table:
CREATE TABLE "REGISTRY"
(
"ID" serial NOT NULL,
"strUID" character varying(11),
"strXml" text,
"intStatus" integer,
"strUIDOrg" character varying(11),
)
It is very important for me that all the ID values are there. What can I do to to solve this problem?
You can not expect serial column to not have holes.
You can implement gapless key by sacrificing concurrency like this:
create table registry_last_id (value int not null);
insert into registry_last_id values (-1);
create function next_registry_id() returns int language sql volatile
as $$
update registry_last_id set value=value+1 returning value
$$;
create table registry ( id int primary key default next_registry_id(), ... )
But any transaction, which tries to insert something to registry table will block until other insert transaction finishes and writes its data to disk. This will limit you to no more than 125 inserting transactions per second on 7500rpm disk drive.
Also any delete from registry table will create a gap.
This solution is based on article Gapless Sequences for Primary Keys by A. Elein Mustain, which is somewhat outdated.
Are you missing 93 records or do you have 3 "holes" of 31 missing numbers?
A sequence is not transaction safe, it will never rollback. Therefor it is not a system to create a sequence of numbers without holes.
From the manual:
Important: To avoid blocking
concurrent transactions that obtain
numbers from the same sequence, a
nextval operation is never rolled
back; that is, once a value has been
fetched it is considered used, even if
the transaction that did the nextval
later aborts. This means that aborted
transactions might leave unused
"holes" in the sequence of assigned
values. setval operations are never
rolled back, either.
Thanks to the answers from Matthew Wood and Frank Heikens i think i have a solution.
Instead of using serial field I have to create my own sequence and define CACHE parameter to 1. This way postgres will not cache values and each one will be taken directly from the sequence :)
Thanks for all your help :)