Postgres 'if not exists' fails because the sequence exists - postgresql

I have several counters in an application I am building, as am trying to get them to be dynamically created by the application as required.
For a simplistic example, if someone types a word into a script it should return the number of times that word has been entered previously. Here is an example of sql that may be executed if they typed the word example.
CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
SELECT nextval('example')
This would return 1 the first time it ran, 2 the second time, etc.
The problem is when 2 people click the button at the same time.
First, please note that a lot more is happening in my application than just these statements, so the chances of them overlapping is much more significant than it would be if this was all that was happening.
1> BEGIN;
2> BEGIN;
1> CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
2> CREATE SEQUENCE IF NOT EXISTS example START WITH 1; -- is blocked by previous statement
1> SELECT nextval('example') -- returns 1 to user.
1> COMMIT; -- unblocks second connection
2> ERROR: duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(example, 109649) already exists.
I was under the impression that by using "IF NOT EXISTS", the statement should just be a no-op if it does exist, but it seems to have this race condition where that is not the case. I say race condition because if these two are not executed at the same time, it works as one would expect.
I have noticed that IF NOT EXISTS is fairly new to postgres, so maybe they haven't worked out all of the kinks yet?
EDIT:
The main reason we were considering doing things this way was to avoid excess locking. The thought being that if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other (except, as in this example, for the initial creation of that sequence)

Sequences are part of the database schema. If you find yourself modifying the schema dynamically based on the data stored in the database, you are probably doing something wrong. This is especially true for sequences, which have special properties e.g. regarding their behavior with respect to transactions. Specifically, if you increment a sequence (with the help of nextval) in the middle of a transaction and then you rollback that transaction, the value of the sequence will not be rolled back. So most likely, this kind of behavior is something that you don't want with your data. In your example, imagine that a user tries to add word. This results in the corresponding sequence being incremented. Now imagine that the transaction does not complete for reason (e.g. maybe the computer crashes) and it gets rolled back. You would end up with the word not being added to the database but with the sequence being incremented.
For the particular example that you mentioned, there is an easy solution; create an ordinary table to store all the "sequences". Something like that would do it:
CREATE TABLE word_frequency (
word text NOT NULL UNIQUE,
frequency integer NOT NULL
);
Now I understand that this is just an example, but if this approach doesn't work for your actual use case, let us know and we can adjust it to your needs.
Edit: Here's how you the above solution works. If a new word is added, run the following query ("UPSERT" syntax in postgres 9.5+ only):
INSERT INTO word_frequency(word,frequency)
VALUES ('foo',1)
ON CONFLICT (word)
DO UPDATE
SET frequency = word_frequency.frequency + excluded.frequency
RETURNING frequency;
This query will insert a new word in word_frequency with frequency 1, or if the word exists already it will increment the existing frequency by 1. Now what happens if two transaction try to do that at the same time? Consider the following scenario:
client 1 client 2
-------- --------
BEGIN
BEGIN
UPSERT ('foo',1)
UPSERT ('foo',1) <====
COMMIT
COMMIT
What will happen is that as soon as client 2 tries increment the frequency for foo (marked with the arrow above), that operation will block because the row was modified by a different transaction. When client 1 commits, client 2 will get unblocked and continue without any errors. This is exactly how we wanted it to work. Also note, that postgresql will use row-level locking to implement this behavior, so other insertions will not be blocked.

EDIT: The main reason we were considering doing things this way was to
avoid excess locking. The thought being that if two people were to
increment at the same time, using a sequence would mean that neither
user should have to wait for the other (except, as in this example,
for the initial creation of that sequence)
It sounds like you're optimizing for a problem that likely does not exist. Sure, if you have 100,000 simultaneous users that are only inserting rows (since a sequence will only be used then normally) there is the possibility of some contention with the sequence but realistically there will be other bottle necks long before the sequence gets in the way.
I'd advise you to first prove that the sequence is an issue. With a proper database design (which dynamic DDL is not) the sequence will not be the bottle neck.
As a reference, DDL is not transaction safe in most databases.

Related

Sequence is giving already given numbers

I'm using a sequence datasource_id_seq created by hand to create unique table names by concatenating a string with numbers returned by the sequence via select nextval('datasource_id_seq').
The code to create it is on my very first migration:
create sequence datasource_id_seq;
And there's nothing like that again in the whole code base.
I recently stump into a bug that ended being the sequence was giving numbers already given. It was returning values in the 6xx (six hundreds) while we already have tables with names over 3xxx (three thousands).
Reading the docs (https://www.postgresql.org/docs/12/functions-sequence.html) the only thing I could catch that points to a reset in the sequence is:
If a sequence object has been created with default parameters, successive nextval calls will return successive values beginning with 1
So, the only ways to reset a sequence is to recreate it or to use setval(), none of which are in the code base.
My question is: how can happen a sequence resets? What other means are to reset a sequence
With properly working, bug-free PostgreSQL server it is not possible to allocate the same sequence number two or more times. It's not even "returned" in case of a rollback. The only method to manipulate next value is using the functions you've already mentioned.
Look for the problem during deployment of your app. I'd speculate that the sequence got dropped and recreated.

loop and change all record numbers in Firebird database

I have a database table with unique record numbers created with generator but because of error in code (setting generators) record numbers suddenly became large because many numbers are skipped. I would like to rewrite all record numbers starting with 1 and finish with total records number. With application it will take a lot of time.
As I see from documentation for Firebird it should be simple task using loop but I have no experience with Firebird programming, I am using only simple SQL statements, can somebody help?
Actually there is no need to program a loop, simple update statement should do. First, reset the generator:
SET GENERATOR my_GEN TO 0;
and then update all the records assigning them new id
update tab set recno = gen_id(my_GEN, 1) order by recno asc;
It assumes that all references to the recno field are via foreign key with ON UPDATE CASCADE, otherwise you either mess up your data or the update fails.
During this operation there should be no other users in the database!
That being said, you really shouldn't care about gaps in your record numbers.

Understanding MON$STAT_ID in the Firebird monitoring tables

I posted a few weeks back inquiring about the firebird DB and how to monitor it. Since then I have come up with a nifty script that monitors all of the page reads/writes/fetches/marks. One of the columns I am monitoring is the MON$STAT_ID and the MON$STAT_GROUP fields. This prints out a nice number for me; however, I have no way to correlate and understand what exactly it is. I thought printing out the MON$STAT_GROUP would help but it has yet to assist me in any way...
I have also looked into the RDB$ commands but have found very limited documentation to see if they might assist me in monitoring my database.
So I decided to come here and inquire first off whether I am monitoring my database in a way that others can view the data from page reads/writes/fetches/marks and make an intelligent decision on whether or not the database is performing as expected.
Secondly, would adding RDB$ commands to my script add anything to the value of the data that I will be giving our database folks?
Lastly, and maybe most importantly, is there anyway to correlate the MON$STAT_ID fields to an actual table in the database to understand when something is going on that should not be? I currently am monitoring the database every minute which may be to frequent, but I am getting valid data out. The only question now is how to interpret this data. Can someone give me advice on methods they use/have used in the past that have worked for them?
(NOTE: Running firebird 2.1)
The column MON$STAT_ID in MON$IO_STATS (and MON$RECORD_STATS and MON$MEMORY_USAGE) is the primary key of the record in the monitoring table. Almost all other monitoring tables include a MON$STAT_ID to point to these statistics: MON$ATTACHMENTS, MON$CALL_STACK, MON$DATABASE, MON$STATEMENTS, MON$TRANSACTIONS.
In other words: the statistics apply on the database, attachment, transaction, statement or call level (PSQL executes). The statistics tables contain a column called MON$STAT_GROUP to discern these types. The values of MON$STAT_GROUP are described in RDB$TYPES:
0 : DATABASE
1 : ATTACHMENT
2 : TRANSACTION
3 : STATEMENT
4 : CALL
Typically the statistics of level 0 contain all from level 1, level 1 contains all from level 2 for that attachment, level 2 contains all from level 3 for that transaction, level 3 contains all from level 4 for that statement.
As there might be data processed unrelated to the lower level, or a specific attachment, transaction or statement handle has already been dropped, the numbers of the lower level do not necessarily aggregate to the entire number of the higher level.
There is no way to correlate the statistics to a specific table (as this information isn't table related, but - simplified - from executing statements which might cover multiple tables).
As I also commented, I am unsure what you mean with "RDB$ commands". But I am assuming you are talking about RDB$GET_CONTEXT() and RDB$SET_CONTEXT(). You could use RDB$GET_CONTEXT() to obtain the current connection (SESSION_ID) and transaction id (TRANSACTION_ID). These values values can be used for MON$ATTACHMENT_ID and MON$TRANSACTION_ID in the monitoring tables. I don't think the other variables in the SYSTEM namespace are interesting, and those in USER_SESSION and USER_TRANSACTION are all user-defined (and initially those namespaces are empty).
It is far easier to use the CURRENT_CONNECTION and CURRENT_TRANSACTION context variables within a statement. As documented in doc\README.monitoring_tables.txt in the Firebird installation:
System variables CURRENT_CONNECTION and CURRENT_TRANSACTION could be used to select data about the current (for the caller) connection and transaction respectively. These variables correspond to the ID columns of the appropriate monitoring tables.
Note: my answer is based on Firebird 2.5.
To present statistics by specific tables I use this SQL (FB 3)
select t.mon$table_name,trim(
case when r.mon$record_seq_reads>0 then 'Non index Reads: '||r.mon$record_seq_reads else '' end||
case when r.mon$record_idx_reads>0 then ' Index Reads: '||r.mon$record_idx_reads else '' end||
case when r.mon$record_inserts>0 then ' Inserts: '||r.mon$record_inserts else '' end||
case when r.mon$record_updates>0 then ' Updates: '||r.mon$record_updates else '' end||
case when r.mon$record_deletes>0 then ' Deletes: '||r.mon$record_deletes else '' end)
from MON$TABLE_STATS t
join mon$record_stats r on r.mon$stat_id=t.mon$record_stat_id
where t.mon$table_name not starting 'RDB$' and r.mon$stat_group=2
order by 1

How to find out if a sequence was initialized in this session?

I need to read the current value of a sequence in a function. However, for the first time in each session I try to use currval(), I get following error:
currval of sequence "foo_seq" is not yet defined in this session
Hint for those who might find this question by googling for this error: you need to initialize the sequence for each session, either by nextval() or setval().
I could use something like lastval() or even setval('your_table_id_seq', (SELECT MAX(id) FROM your_table)); instead, but this seems seems either prone to gaps or slower than simple currval(). My aim is to avoid gaps and inconsistencies (I know some of the values will be added manually), so using nextval() before logic handling them is not ideal for my purpose. I would need this to initialize the sequence for the session anyway, but I would prefer to do something like this:
--start of the function here
IF is_not_initialized THEN
SELECT setval('foo_seq', (SELECT MAX(id) FROM bar_table)) INTO _current;
ELSE
SELECT currval('foo_seq') INTO _current;
END IF;
--some magic with the _current variable and nextvalue() on the right position
The point is that I have no idea how might "is_not_initialized" look like and whether is it possible at all. Is there any function or other trick to do it?
EDIT: Actually, my plan is to let each group of customers choose between proper sequence, no sequence at all, and the strange "something like a sequence" I'm asking for now. Even if the customer wanted such a strange sequence, it would be used only for the columns where it is needed - usually because there are some analog data and we need to store their keys (usually almost gapless sequence) into the DB for backward compatibility.
Anyway, you are right that this is hardly proper solution and that no sequence might be better than such a messy workaround in those situations, so I'll think (and discuss with customers) again whether it is really needed.
Craig, a_horse and pozs have provided information which can help you understand principles of using sequences. Apart from the question how are you going to use it, here is a function which returns current value of a sequence if it has been initialized or null otherwise.
If a sequence seq has not been initialized yet, currval(seq) raises exception with sqlstate 55000.
create or replace function current_seq_value(seq regclass)
returns integer language plpgsql
as $$
begin
begin
return (select currval(seq));
exception
when sqlstate '55000' then return null;
end;
end $$;
select current_seq_value('my_table_id_seq')
My aim is to avoid gaps and inconsistencies
You cannot use sequences if you want to avoid gaps. Nor can you reasonably use sequences if you want to assign some values manually.
The approach you are taking is unsound. It will not work. Forget about it, it isn't going to do what you think it's going to do.
I just wrote a sample implementation of a trivial gapless sequence generator for someone a few days ago, and there's a more complete one in this question.
You need to understand that unlike true sequences, gapless sequences are transactional. A consequence is that only one running transaction can have an uncommitted ID. If 100 concurrent transactions try to get IDs, only one of them will actually get the ID. The others will have to wait until that one commits or rolls back. So they're terrible for concurrency, especially if combined with long running transactions. They can also cause deadlocks if you use multiple different gapless sequences and different transactions might access them in different orders.
So think carefully whether you really need this.
Read: PostgreSQL gapless sequences

Sequence field, tracking individual sequences for primary key/sequence pair

I have a table of user-uploaded objects. Each user can have an arbitrary number of objects. I want each object to have a sequential identifier, like so:
USERNAME OBJECTNAME OBJID
Kerin cat 1
Kerin dog 2
Narcolepsy pie_tins 1
Kerin mouse 3
I'd like for OBJID to be a sequence, but tracking the sequence number individually per USERNAME field. I can sort of accomplish this by first querying the DB and SELECTing the highest OBJID and then incrementing that value by one and using it in my INSERT, and that's probably fine because it'd be difficult for a user to run two uploads at once, but the query overhead and the feeling that I'm doing it wrong makes me want to find a better way.
If you don't need them to be sequential then you could probably get away with adding a PK of type serial (or bigserial) to the table. The numbers would still be unique per-username but it would be dead simple to implement and you wouldn't have the ugliness of UUIDs.
You could create one sequence per username through manual CREATE SEQUENCE calls. Then, you could add a BEFORE INSERT trigger to set the objid by figuring out which sequence to use and then calling nextval on it. If your usernames are limited to the usual /[a-z][a-z0-9]*/ pattern, then you could build the sequence names as something like "seq_objid_username" and the trigger would be able to figure out which sequence to use quite easily; the per-username sequences could be created by an INSERT trigger on your user table. This approach will work and it will be safe because it relies on PostgreSQL's existing transaction-safe sequence system.