How to find out if a sequence was initialized in this session? - postgresql

I need to read the current value of a sequence in a function. However, for the first time in each session I try to use currval(), I get following error:
currval of sequence "foo_seq" is not yet defined in this session
Hint for those who might find this question by googling for this error: you need to initialize the sequence for each session, either by nextval() or setval().
I could use something like lastval() or even setval('your_table_id_seq', (SELECT MAX(id) FROM your_table)); instead, but this seems seems either prone to gaps or slower than simple currval(). My aim is to avoid gaps and inconsistencies (I know some of the values will be added manually), so using nextval() before logic handling them is not ideal for my purpose. I would need this to initialize the sequence for the session anyway, but I would prefer to do something like this:
--start of the function here
IF is_not_initialized THEN
SELECT setval('foo_seq', (SELECT MAX(id) FROM bar_table)) INTO _current;
ELSE
SELECT currval('foo_seq') INTO _current;
END IF;
--some magic with the _current variable and nextvalue() on the right position
The point is that I have no idea how might "is_not_initialized" look like and whether is it possible at all. Is there any function or other trick to do it?
EDIT: Actually, my plan is to let each group of customers choose between proper sequence, no sequence at all, and the strange "something like a sequence" I'm asking for now. Even if the customer wanted such a strange sequence, it would be used only for the columns where it is needed - usually because there are some analog data and we need to store their keys (usually almost gapless sequence) into the DB for backward compatibility.
Anyway, you are right that this is hardly proper solution and that no sequence might be better than such a messy workaround in those situations, so I'll think (and discuss with customers) again whether it is really needed.

Craig, a_horse and pozs have provided information which can help you understand principles of using sequences. Apart from the question how are you going to use it, here is a function which returns current value of a sequence if it has been initialized or null otherwise.
If a sequence seq has not been initialized yet, currval(seq) raises exception with sqlstate 55000.
create or replace function current_seq_value(seq regclass)
returns integer language plpgsql
as $$
begin
begin
return (select currval(seq));
exception
when sqlstate '55000' then return null;
end;
end $$;
select current_seq_value('my_table_id_seq')

My aim is to avoid gaps and inconsistencies
You cannot use sequences if you want to avoid gaps. Nor can you reasonably use sequences if you want to assign some values manually.
The approach you are taking is unsound. It will not work. Forget about it, it isn't going to do what you think it's going to do.
I just wrote a sample implementation of a trivial gapless sequence generator for someone a few days ago, and there's a more complete one in this question.
You need to understand that unlike true sequences, gapless sequences are transactional. A consequence is that only one running transaction can have an uncommitted ID. If 100 concurrent transactions try to get IDs, only one of them will actually get the ID. The others will have to wait until that one commits or rolls back. So they're terrible for concurrency, especially if combined with long running transactions. They can also cause deadlocks if you use multiple different gapless sequences and different transactions might access them in different orders.
So think carefully whether you really need this.
Read: PostgreSQL gapless sequences

Related

Find the usage of a Postgres procedure

I want to change a Postgres procedure, but before doing this I want to make sure that it does not break anything.
To do this, I need to find its usage.
If you have track_functions set to pl or all, you can use the view pg_stat_user_functions to see how often a function was called.
But that's probably not what you want; I assume you want to know from which other functions a function is called. This is much more difficult, because PostgreSQL does not track dependencies between a function and the objects used in its code. The best you can do is perform a string search:
SELECT oid::regprocedure
FROM pg_catalog.pg_proc
WHERE prosrc ILIKE '%your_function_name%';
That can of course produce false positives, particularly if the function name you search for is a string commonly used in code.
There can potentially be false negatives as well, consider this piece of PL/pgSQL code:
EXECUTE 'SELECT your_func' || 'tion_name()';
But you may be able to exclude such cases if you know your code base.
A completely different matter is the question where in your application code the function is used. But since you tagged the question only as postgresql, I assume that this is out of scope here.

How to implement a high performing non incremental ID in postgresql? [duplicate]

I would like to replace some of the sequences I use for id's in my postgresql db with my own custom made id generator. The generator would produce a random number with a checkdigit at the end. So this:
SELECT nextval('customers')
would be replaced by something like this:
SELECT get_new_rand_id('customer')
The function would then return a numerical value such as: [1-9][0-9]{9} where the last digit is a checksum.
The concerns I have is:
How do I make the thing atomic
How do I avoid returning the same id twice (this would be caught by trying to insert it into a column with unique constraint but then its to late to I think)
Is this a good idea at all?
Note1: I do not want to use uuid since it is to be communicated with customers and 10 digits is far simpler to communicate than the 36 character uuid.
Note2: The function would rarely be called with SELECT get_new_rand_id() but would be assigned as default value on the id-column instead of nextval().
EDIT: Ok, good discussusion below! Here are some explanation for why:
So why would I over-comlicate things this way? The purpouse is to hide the primary key from the customers.
I give each new customer a unique
customerId (generated serial number in
the db). Since I communicate that
number with the customer it is a
fairly simple task for my competitors
to monitor my business (there are
other numbers such as invoice nr and
order nr that have the same
properties). It is this monitoring I
would like to make a little bit
harder (note: not impossible but
harder).
Why the check digit?
Before there was any talk of hiding the serial nr I added a checkdigit to ordernr since there were klumbsy fingers at some points in the production, and my thought was that this would be a good practice to keep in the future.
After reading the discussion I can certainly see that my approach is not the best way to solve my problem, but I have no other good idea of how to solve it, so please help me out here.
Should I add an extra column where I put the id I expose to the customer and keep the serial as primary key?
How can I generate the id to expose in a sane and efficient way?
Is the checkdigit necessary?
For generating unique and random-looking identifiers from a serial, using ciphers might be a good idea. Since their output is bijective (there is a one-to-one mapping between input and output values) -- you will not have any collisions, unlike hashes. Which means your identifiers don't have to be as long as hashes.
Most cryptographic ciphers work on 64-bit or larger blocks, but the PostgreSQL wiki has an example PL/pgSQL procedure for a "non-cryptographic" cipher function that works on (32-bit) int type. Disclaimer: I have not tried using this function myself.
To use it for your primary keys, run the CREATE FUNCTION call from the wiki page, and then on your empty tables do:
ALTER TABLE foo ALTER COLUMN foo_id SET DEFAULT pseudo_encrypt(nextval('foo_foo_id_seq')::int);
And voila!
pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> select * from foo;
foo_id
------------
1241588087
1500453386
1755259484
(4 rows)
I added my comment to your question and then realized that I should have explained myself better... My apologies.
You could have a second key - not the primary key - that is visible to the user. That key could use the primary as the seed for the hash function you describe and be the one that you use to do lookups. That key would be generated by a trigger after insert (which is much simpler than trying to ensure atomicity of the operation) and
That is the key that you share with your clients, never the PK. I know there is debate (albeit, I can't understand why) if PKs are to be invisible to the user applications or not. The modern database design practices, and my personal experience, all seem to suggest that PKs should NOT be visible to users. They tend to attach meaning to them and, over time, that is a very bad thing - regardless if they have a check digit in the key or not.
Your joins will still be done using the PK. This other generated key is just supposed to be used for client lookups. They are the face, the PK is the guts.
Hope that helps.
Edit: FWIW, there is little to be said about "right" or "wrong" in database design. Sometimes it boils down to a choice. I think the choice you face will be better served by leaving the PK alone and creating a secondary key - just that.
I think you are way over-complicating this. Why not let the database do what it does best and let it take care of atomicity and ensuring that the same id is not used twice? Why not use a postgresql SERIAL type and get an autogenerated surrogate primary key, just like an integer IDENTITY column in SQL Server or DB2? Use that on the column instead. Plus it will be faster than your user-defined function.
I concur regarding hiding this surrogate primary key and using an exposed secondary key (with a unique constraint on it) to lookup clients in your interface.
Are you using a sequence because you need a unique identifier across several tables? This is usually an indication that you need to rethink your table design, and those several tables should perhaps be combined into one, with an autogenerated surrogate primary key.
Also see here
How you generate the random and unique ids is a useful question - but you seem to be making a counter productive assumption about when to generate them!
My point is that you do not need to generate these id's at the time of creating your rows, because they are essentially independent of the data being inserted.
What I do is pre-generate random id's for future use, that way I can take my own sweet time and absolutely guarantee they are unique, and there's no processing to be done at the time of the insert.
For example I have an orders table with order_id in it. This id is generated on the fly when the user enters the order, incrementally 1,2,3 etc forever. The user does not need to see this internal id.
Then I have another table - random_ids with (order_id, random_id). I have a routine that runs every night which pre-loads this table with enough rows to more than cover the orders that might be inserted in the next 24 hours. (If I ever get 10000 orders in one day I'll have a problem - but that would be a good problem to have!)
This approach guarantees uniqueness and takes any processing load away from the insert transaction and into the batch routine, where it does not affect the user.
Your best bet would probably be some form of hash function, and then a checksum added to the end.
If you're not using this too often (you do not have a new customer every second, do you?) then it is feasible to just get a random number and then try to insert the record. Just be prepared to retry inserting with another number when it fails with unique constraint violation.
I'd use numbers 1000000 to 999999 (900000 possible numbers of the same length) and check digit using UPC or ISBN 10 algorithm. 2 check digits would be better though as they'll eliminate 99% of human errors instead of 9%.

Postgres 'if not exists' fails because the sequence exists

I have several counters in an application I am building, as am trying to get them to be dynamically created by the application as required.
For a simplistic example, if someone types a word into a script it should return the number of times that word has been entered previously. Here is an example of sql that may be executed if they typed the word example.
CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
SELECT nextval('example')
This would return 1 the first time it ran, 2 the second time, etc.
The problem is when 2 people click the button at the same time.
First, please note that a lot more is happening in my application than just these statements, so the chances of them overlapping is much more significant than it would be if this was all that was happening.
1> BEGIN;
2> BEGIN;
1> CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
2> CREATE SEQUENCE IF NOT EXISTS example START WITH 1; -- is blocked by previous statement
1> SELECT nextval('example') -- returns 1 to user.
1> COMMIT; -- unblocks second connection
2> ERROR: duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(example, 109649) already exists.
I was under the impression that by using "IF NOT EXISTS", the statement should just be a no-op if it does exist, but it seems to have this race condition where that is not the case. I say race condition because if these two are not executed at the same time, it works as one would expect.
I have noticed that IF NOT EXISTS is fairly new to postgres, so maybe they haven't worked out all of the kinks yet?
EDIT:
The main reason we were considering doing things this way was to avoid excess locking. The thought being that if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other (except, as in this example, for the initial creation of that sequence)
Sequences are part of the database schema. If you find yourself modifying the schema dynamically based on the data stored in the database, you are probably doing something wrong. This is especially true for sequences, which have special properties e.g. regarding their behavior with respect to transactions. Specifically, if you increment a sequence (with the help of nextval) in the middle of a transaction and then you rollback that transaction, the value of the sequence will not be rolled back. So most likely, this kind of behavior is something that you don't want with your data. In your example, imagine that a user tries to add word. This results in the corresponding sequence being incremented. Now imagine that the transaction does not complete for reason (e.g. maybe the computer crashes) and it gets rolled back. You would end up with the word not being added to the database but with the sequence being incremented.
For the particular example that you mentioned, there is an easy solution; create an ordinary table to store all the "sequences". Something like that would do it:
CREATE TABLE word_frequency (
word text NOT NULL UNIQUE,
frequency integer NOT NULL
);
Now I understand that this is just an example, but if this approach doesn't work for your actual use case, let us know and we can adjust it to your needs.
Edit: Here's how you the above solution works. If a new word is added, run the following query ("UPSERT" syntax in postgres 9.5+ only):
INSERT INTO word_frequency(word,frequency)
VALUES ('foo',1)
ON CONFLICT (word)
DO UPDATE
SET frequency = word_frequency.frequency + excluded.frequency
RETURNING frequency;
This query will insert a new word in word_frequency with frequency 1, or if the word exists already it will increment the existing frequency by 1. Now what happens if two transaction try to do that at the same time? Consider the following scenario:
client 1 client 2
-------- --------
BEGIN
BEGIN
UPSERT ('foo',1)
UPSERT ('foo',1) <====
COMMIT
COMMIT
What will happen is that as soon as client 2 tries increment the frequency for foo (marked with the arrow above), that operation will block because the row was modified by a different transaction. When client 1 commits, client 2 will get unblocked and continue without any errors. This is exactly how we wanted it to work. Also note, that postgresql will use row-level locking to implement this behavior, so other insertions will not be blocked.
EDIT: The main reason we were considering doing things this way was to
avoid excess locking. The thought being that if two people were to
increment at the same time, using a sequence would mean that neither
user should have to wait for the other (except, as in this example,
for the initial creation of that sequence)
It sounds like you're optimizing for a problem that likely does not exist. Sure, if you have 100,000 simultaneous users that are only inserting rows (since a sequence will only be used then normally) there is the possibility of some contention with the sequence but realistically there will be other bottle necks long before the sequence gets in the way.
I'd advise you to first prove that the sequence is an issue. With a proper database design (which dynamic DDL is not) the sequence will not be the bottle neck.
As a reference, DDL is not transaction safe in most databases.

Dealing with division errors in PostgresQL without procedural code - is it possible?

I have to produce (in PostgresQL, if that matters) a table containing a column with the quotient of two sums, basically like this (quite simplified):
select name, sum(a)/sum(b), sum(c)/sum(d)
from a_complex_nested_select_query_with_many_zeros
group by name
order by name;
The table has tens of thousands of rows (not too big), but in a few cases, summing over b or d does produce 0, which causes the whole query to fail with Divide by 0.
In researching how to deal with the exception, I was only able to find information on PL/pgSQL Control Structures, which appears to require the creation of a function (but I'm not sure).
My question is of course how to make this query work. Perhaps the answer has something to do with
Can an exception be caught in non-procedural SQL (PostgresQL, perhaps?)
Is this a case where procedural code is necessary?
Can a CASE..WHEN..ELSE..END structure avoid the problem (I'm stuck on this because it looks like the SUM() calls are repeated!), but it is appealing because I do not know enough about Postgres to know whether exception catching has a performance penalty.
Is there a way to, again without a function, ensure SUM() is evaluated once in a CASE expression?
If a function is required, what would it look like?
EDIT By "repeating sum calls" I mean that I know I could write:
select name,
case when sum(b)=0 then null else sum(a)/sum(b) end,
case when sum(d)=0 then null else sum(c)/sum(d) end
and so on, but am not sure if that is a good thing. (I guess someone will answer with a why-don't-you-profile-it but I think there may be better approaches out there, somewhere.)
nullif will return null if the arguments are equal. A division by null evaluates to null
select
name,
sum(a) / nullif(sum(b), 0),
sum(c) / nullif(sum(d), 0)
from a_complex_nested_select_query_with_many_zeros
group by name
order by name;

Execute statements for every record in a table

I have a temporary table (or, say, a function which returns a table of values).
I want to execute some statements for each record in the table.
Can this be done without using cursors?
I'm not opposed to cursors, but would like a more elegant syntax\way of doing it.
Something like this randomly made-up syntax:
for (select A,B from #temp) exec DoSomething A,B
I'm using Sql Server 2005.
I dont think what you want to to is that easy.
What i have found is that you can create a scalar function taking the arguments A and B and then from within the function execute an Extended Stored Procedure. This might achieve what you want to do, but it seems that this might make the code even more complex.
I think for readibility and maintainability, you should stick to the CURSOR implementation.
I would look into changing the stored proc so that it can work against a set of data rather than a single row input.
Would CROSS/OUTER APPLY do what you want if you need RBAR processing.
It's elegant, but depends on what processing you need to do