If there's a queue of work todo in a table that is going to be periodically polled by a number of different worker clients...what's the best way to prevent each worker from getting the same item to work on?
Say a table like: ItemId, LastAttemptDateTime, AttemptCount, and various item details.
Given an index on LastAttemptDateTime and sorted in ascending order and various clients are querying the table to grab an item to be worked on.
I use a stored procedure in MS SQL to do this...something like:
CREATE PROCEDURE GetNextQueueItem AS
SET NOCOUNT ON
DECLARE #ItemId INT
UPDATE myqueue SET #ItemId=ItemId, AttemptCount=AttemptCount+1, LastAttemptDateTime=GetDate()
WHERE ItemId=(SELECT TOP 1 ItemId
FROM myqueue
ORDER BY LastAttemptDateTime ASC)
SELECT ItemId, AttemptCount, and various item detail fields
FROM myqueue
WHERE ItemId = #ItemId
I'm fairly new to PostgreSQL and was wondering if there's alternate approaches available. (The TOP 1 will change to LIMIT 1.)
PostgreSQL equivalent could look like this:
CREATE OR REPLACE FUNCTION get_next_queue_item()
RETURNS SETOF myqueue AS
$BODY$
BEGIN
RETURN QUERY
UPDATE myqueue
SET attempt_count = attempt_count + 1
,last_attempt_ts = now()
WHERE item_id = (
SELECT item_id
FROM myqueue
ORDER BY last_attempt_ts
LIMIT 1
)
RETURNING myqueue.*;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Major points
You only need 1 statement to do it all. UPDATE can return the updated row in the same command with the RETURNING clause.
State of the row is post-update. There is ways to get the pre-update state if needed.
No need for any variables.
I changed all identifiers to lower case, which is the cleanest style in PostgreSQL.
I renamed your column LastAttemptDateTime to last_attempt_ts
ts .. for "timestamp", because that's the name of the timestamp / datetime type in Postgres.
As you mentioned yourself, LIMIT 1 instead of TOP 1.
I use RETURNS SETOF myqueue as return type.
myqueue is the associated row-type of the table myqueue - for every table or view a row-type of the same name is automatically created in PostgreSQL.
This declaration allows for multiple rows to be returned, but LIMIT 1 guarantees that it will only ever be one.
This return type allows for RETURN QUERY to return the resulting row directly without any intermediate step. Fast, clean.
Actually, you don't need a plpgsql function at all. You can do it with a simple SQL statement:
UPDATE myqueue
SET attempt_count = attempt_count + 1
,last_attempt_ts = now()
WHERE item_id = (
SELECT item_id
FROM myqueue
ORDER BY last_attempt_ts
LIMIT 1
)
RETURNING myqueue.*;
Since PostgreSQL has sequences separate to identity columns incremented with them that can be used for other things, one nice way to do have a sequence used to set an id on the table, and another for getting the item:
Look at the currval of the sequence, if it's higher than or equal to the max id of the table, there's no items waiting.
Obtain nextval. If there is no item with a matching id then loop back to 1 (this can happen if an insert to the table failed).
Obtain the row with the matching id.
This isn't the only way to skin this cat (and not the way I've used with other databases), but has the advantage of being light on writes to the database (altering only the sequence, not the table.
Related
I would like to create a trigger function inside my database which checks, if the newly "inserted" value (max_bid) is at least +1 greater than the largest max_bid value currently in the table.
If this is the case, the max_bid value inside the table should be updated, although not with the newly "inserted" value, but instead it should be increased by 1.
For instance, if max_bid is 10 and the newly "inserted" max_bid is 20, the max_bid value inside the table should be increased by +1 (in this case 11).
I tried to do it with a trigger, but unfortunatelly it doesn't work. Please help me to solve this problem.
Here is my code:
CREATE TABLE bidtable (
mail_buyer VARCHAR(80) NOT NULL,
auction_id INTEGER NOT NULL,
max_bid INTEGER,
PRIMARY KEY (mail_buyer),
);
CREATE OR REPLACE FUNCTION max_bid()
RETURNS TRIGGER LANGUAGE PLPGSQL AS $$
DECLARE
current_maxbid INTEGER;
BEGIN
SELECT MAX(max_bid) INTO current_maxbid
FROM bidtable WHERE NEW.auction_id = OLD.auction_id;
IF (NEW.max_bid < (current_maxbid + 1)) THEN
RAISE EXCEPTION 'error';
RETURN NULL;
END IF;
UPDATE bidtable SET max_bid = (current_maxbid + 1)
WHERE NEW.auction_id = OLD.auction_id
AND NEW.mail_buyer = OLD.mail_buyer;
RETURN NEW;
END;
$$;
CREATE OR REPLACE TRIGGER max_bid_trigger
BEFORE INSERT
ON bidtable
FOR EACH ROW
EXECUTE PROCEDURE max_bid();
Thank you very much for your help.
In a trigger function that is called for an INSERT operation the OLD implicit record variable is null, which is probably the cause of "unfortunately it doesn't work".
Trigger function
In a case like this there is a much easier solution. First of all, disregard the value for max_bid upon input because you require a specific value in all cases. Instead, you are going to set it to that specific value in the function. The trigger function can then be simplified to:
CREATE OR REPLACE FUNCTION set_max_bid() -- Function name different from column name
RETURNS TRIGGER LANGUAGE PLPGSQL AS $$
BEGIN
SELECT MAX(max_bid) + 1 INTO NEW.max_bid
FROM bidtable
WHERE auction_id = NEW.auction_id;
RETURN NEW;
END; $$;
That's all there is to it for the trigger function. Update the trigger to the new function name and it should work.
Concurrency
As several comments to your question pointed out, you run the risk of getting duplicates. This will currently not generate an error because you do not have an appropriate constraint on your table. Avoiding duplicates requires a table constraint like:
UNIQUE (auction_id, max_bid)
You cannot deal with any concurrency issue in the trigger function because the INSERT operation will take place after the trigger function completes with a RETURN NEW statement. What would be the most appropriate way to deal with this depends on your application. Your options are table locking to block any concurrent inserts, or looping in a function until the insert succeeds.
Avoid the concurrency issue altogether
If you can change the structure of the bidtable table, you can get rid of the whole concurrency issue by changing your business logic to not require the max_bid column. The max_bid column appears to indicate the order in which bids were placed for each auction_id. If that is the case then you could add a serial column to your table and use that to indicate order of bids being placed (for all auctions). That serial column could then also be the PRIMARY KEY to make your table more agile (no indexing on a large text column). The table would look something like this:
CREATE TABLE bidtable (
id SERIAL PRIMARY KEY,
mail_buyer VARCHAR(80) NOT NULL,
auction_id INTEGER NOT NULL
);
You can drop your trigger and trigger function and just depend on the proper id value being supplied by the system.
The bids for a specific action can then be extracted using a straightforward SELECT:
SELECT id, mail_buyer
FROM bidtable
WHERE auction_id = xxx
ORDER BY id;
If you require a max_bid-like value (the id values increment over the full set of auctions), you can use a simple window function:
SELECT mail_buyer, row_number() AS max_bid OVER (PARTITION BY auction_id ORDER BY id)
FROM bidtable
WHERE auction_id = xxx;
I'm moving from MySql to Postgres, and I noticed that when you delete rows from MySql, the unique ids for those rows are re-used when you make new ones. With Postgres, if you create rows, and delete them, the unique ids are not used again.
Is there a reason for this behaviour in Postgres? Can I make it act more like MySql in this case?
Sequences have gaps to permit concurrent inserts. Attempting to avoid gaps or to re-use deleted IDs creates horrible performance problems. See the PostgreSQL wiki FAQ.
PostgreSQL SEQUENCEs are used to allocate IDs. These only ever increase, and they're exempt from the usual transaction rollback rules to permit multiple transactions to grab new IDs at the same time. This means that if a transaction rolls back, those IDs are "thrown away"; there's no list of "free" IDs kept, just the current ID counter. Sequences are also usually incremented if the database shuts down uncleanly.
Synthetic keys (IDs) are meaningless anyway. Their order is not significant, their only property of significance is uniqueness. You can't meaningfully measure how "far apart" two IDs are, nor can you meaningfully say if one is greater or less than another. All you can do is say "equal" or "not equal". Anything else is unsafe. You shouldn't care about gaps.
If you need a gapless sequence that re-uses deleted IDs, you can have one, you just have to give up a huge amount of performance for it - in particular, you cannot have any concurrency on INSERTs at all, because you have to scan the table for the lowest free ID, locking the table for write so no other transaction can claim the same ID. Try searching for "postgresql gapless sequence".
The simplest approach is to use a counter table and a function that gets the next ID. Here's a generalized version that uses a counter table to generate consecutive gapless IDs; it doesn't re-use IDs, though.
CREATE TABLE thetable_id_counter ( last_id integer not null );
INSERT INTO thetable_id_counter VALUES (0);
CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $$
DECLARE
next_value integer;
BEGIN
EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
RETURN next_value;
END;
$$ LANGUAGE plpgsql;
COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';
Usage:
INSERT INTO dummy(id, blah)
VALUES ( get_next_id('thetable_id_counter','last_id'), 42 );
Note that when one open transaction has obtained an ID, all other transactions that try to call get_next_id will block until the 1st transaction commits or rolls back. This is unavoidable and for gapless IDs and is by design.
If you want to store multiple counters for different purposes in a table, just add a parameter to the above function, add a column to the counter table, and add a WHERE clause to the UPDATE that matches the parameter to the added column. That way you can have multiple independently-locked counter rows. Do not just add extra columns for new counters.
This function does not re-use deleted IDs, it just avoids introducing gaps.
To re-use IDs I advise ... not re-using IDs.
If you really must, you can do so by adding an ON INSERT OR UPDATE OR DELETE trigger on the table of interest that adds deleted IDs to a free-list side table, and removes them from the free-list table when they're INSERTed. Treat an UPDATE as a DELETE followed by an INSERT. Now modify the ID generation function above so that it does a SELECT free_id INTO next_value FROM free_ids FOR UPDATE LIMIT 1 and if found, DELETEs that row. IF NOT FOUND gets a new ID from the generator table as normal. Here's an untested extension of the prior function to support re-use:
CREATE OR REPLACE FUNCTION get_next_id_reuse(countertable regclass, countercolumn text, freelisttable regclass, freelistcolumn text) RETURNS integer AS $$
DECLARE
next_value integer;
BEGIN
EXECUTE format('SELECT %I FROM %s FOR UPDATE LIMIT 1', freelistcolumn, freelisttable) INTO next_value;
IF next_value IS NOT NULL THEN
EXECUTE format('DELETE FROM %s WHERE %I = %L', freelisttable, freelistcolumn, next_value);
ELSE
EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
END IF;
RETURN next_value;
END;
$$ LANGUAGE plpgsql;
I have a table in postgres 9.4 and I need to do the following:
When new inserted record comes, I need to find previous records with given parameters and assign it's 'next_message' column value to newly generated record. So I want each record had reference to the next one with given filter, for example 'session_id'. So, if session_id=5, all records with seesion_id=5 should reference next one. I created a trigger that selects previous record and set this field. But it's bad and it will not work in highly loaded db table. How to do that?
That's my trigger:
CREATE OR REPLACE FUNCTION "public"."messages_compute_next_message" () RETURNS trigger
VOLATILE
AS $dbvis$
DECLARE previous_session_message integer;
BEGIN
/*NEW.next_message_id=NEW.id;*/
update message set next_message_id=NEW.id where id=(select max(c.id) from message c where c.session_id=NEW.session_id and c.id<>NEW.id);
RETURN NEW;
END
$dbvis$ LANGUAGE plpgsql
If I post records too frequently, I get many null values in next_message_id fields. And it's logical, otherwise I have to block entire table on every insert. How to do that properly in postgres?
I'd say forget about next_message_id.
The way your trigger looks, it seems to me that messages are ordered by id within one session.
So if you need the previous message for the message with id 42, you can find it with
SELECT max(prev.id)
FROM message prev JOIN message curr
ON prev.session_id = curr.session_id
AND prev.id < curr.id
WHERE curr.id = 42;
Setting the right indexes will speed this up considerably.
I assume there already is an index on id; maybe a second index ON (session_id, id) will help.
In PostgreSQL (9.3) I have a table defined as:
CREATE TABLE charts
( recid serial NOT NULL,
groupid text NOT NULL,
chart_number integer NOT NULL,
"timestamp" timestamp without time zone NOT NULL DEFAULT now(),
modified timestamp without time zone NOT NULL DEFAULT now(),
donotsee boolean,
CONSTRAINT pk_charts PRIMARY KEY (recid),
CONSTRAINT chart_groupid UNIQUE (groupid),
CONSTRAINT charts_ichart_key UNIQUE (chart_number)
);
CREATE TRIGGER update_modified
BEFORE UPDATE ON charts
FOR EACH ROW EXECUTE PROCEDURE update_modified();
I would like to replace the chart_number with a sequence like:
CREATE SEQUENCE charts_chartnumber_seq START 16047;
So that by trigger or function, adding a new chart record automatically generates a new chart number in ascending order. However, no existing chart record can have its chart number changed and over the years there have been skips in the assigned chart numbers. Hence, before assigning a new chart number to a new chart record, I need to be sure that the "new" chart number has not yet been used and any chart record with a chart number is not assigned a different number.
How can this be done?
Consider not doing it. Read these related answers first:
Gap-less sequence where multiple transactions with multiple tables are involved
Compacting a sequence in PostgreSQL
If you still insist on filling in gaps, here is a rather efficient solution:
1. To avoid searching large parts of the table for the next missing chart_number, create a helper table with all current gaps once:
CREATE TABLE chart_gap AS
SELECT chart_number
FROM generate_series(1, (SELECT max(chart_number) - 1 -- max is no gap
FROM charts)) chart_number
LEFT JOIN charts c USING (chart_number)
WHERE c.chart_number IS NULL;
2. Set charts_chartnumber_seq to the current maximum and convert chart_number to an actual serial column:
SELECT setval('charts_chartnumber_seq', max(chart_number)) FROM charts;
ALTER TABLE charts
ALTER COLUMN chart_number SET NOT NULL
, ALTER COLUMN chart_number SET DEFAULT nextval('charts_chartnumber_seq');
ALTER SEQUENCE charts_chartnumber_seq OWNED BY charts.chart_number;
Details:
How to reset postgres' primary key sequence when it falls out of sync?
Safely and cleanly rename tables that use serial primary key columns in Postgres?
3. While chart_gap is not empty fetch the next chart_number from there.
To resolve possible race conditions with concurrent transactions, without making transactions wait, use advisory locks:
WITH sel AS (
SELECT chart_number, ... -- other input values
FROM chart_gap
WHERE pg_try_advisory_xact_lock(chart_number)
LIMIT 1
FOR UPDATE
)
, ins AS (
INSERT INTO charts (chart_number, ...) -- other target columns
TABLE sel
RETURNING chart_number
)
DELETE FROM chart_gap c
USING ins i
WHERE i.chart_number = c.chart_number;
Alternatively, Postgres 9.5 or later has the handy FOR UPDATE SKIP LOCKED to make this simpler and faster:
...
SELECT chart_number, ... -- other input values
FROM chart_gap
LIMIT 1
FOR UPDATE SKIP LOCKED
...
Detailed explanation:
Postgres UPDATE ... LIMIT 1
Check the result. Once all rows are filled in, this returns 0 rows affected. (you could check in plpgsql with IF NOT FOUND THEN ...). Then switch to a simple INSERT:
INSERT INTO charts (...) -- don't list chart_number
VALUES (...); -- don't provide chart_number
In PostgreSQL, a SEQUENCE ensures the two requirements you mention, that is:
No repeats
No changes once assigned
But because of how a SEQUENCE works (see manual), it can not ensure no-skips. Among others, the first two reasons that come to mind are:
How a SEQUENCE handles concurrent blocks with INSERTS (you could also add that the concept of Cache also makes this impossible)
Also, user triggered DELETEs are an uncontrollable aspect that a SEQUENCE can not handle by itself.
In both cases, if you still do not want skips, (and if you really know what you're doing) you should have a separate structure that assign IDs (instead of using SEQUENCE). Basically a system that has a list of 'assignable' IDs stored in a TABLE that has a function to pop out IDs in a FIFO way. That should allow you to control DELETEs etc.
But again, this should be attempted, only if you really know what you're doing! There's a reason why people don't do SEQUENCEs themselves. There are hard corner-cases (for e.g. concurrent INSERTs) and most probably you're over-engineering your problem case, that probably can be solved in a much better / cleaner way.
Sequence numbers usually have no meaning, so why worry? But if you really want this, then follow the below, cumbersome procedure. Note that it is not efficient; the only efficient option is to forget about the holes and use the sequence.
In order to avoid having to scan the charts table on every insert, you should scan the table once and store the unused chart_number values in a separate table:
CREATE TABLE charts_unused_chart_number AS
SELECT seq.unused
FROM (SELECT max(chart_number) FROM charts) mx,
generate_series(1, mx(max)) seq(unused)
LEFT JOIN charts ON charts.chart_number = seq.unused
WHERE charts.recid IS NULL;
The above query generates a contiguous series of numbers from 1 to the current maximum chart_number value, then LEFT JOINs the charts table to it and find the records where there is no corresponding charts data, meaning that value of the series is unused as a chart_number.
Next you create a trigger that fires on an INSERT on the charts table. In the trigger function, pick a value from the table created in the step above:
CREATE FUNCTION pick_unused_chart_number() RETURNS trigger AS $$
BEGIN
-- Get an unused chart number
SELECT unused INTO NEW.chart_number FROM charts_unused_chart_number LIMIT 1;
-- If the table is empty, get one from the sequence
IF NOT FOUND THEN
NEW.chart_number := next_val(charts_chartnumber_seq);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tr_charts_cn
BEFORE INSERT ON charts
FOR EACH ROW EXECUTE PROCEDURE pick_unused_chart_number();
Easy. But the INSERT may fail because of some other trigger aborting the procedure or any other reason. So you need a check to ascertain that the chart_number was indeed inserted:
CREATE FUNCTION verify_chart_number() RETURNS trigger AS $$
BEGIN
-- If you get here, the INSERT was successful, so delete the chart_number
-- from the temporary table.
DELETE FROM charts_unused_chart_number WHERE unused = NEW.chart_number;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tr_charts_verify
AFTER INSERT ON charts
FOR EACH ROW EXECUTE PROCEDURE verify_chart_number();
At a certain point the table with unused chart numbers will be empty whereupon you can (1) ALTER TABLE charts to use the sequence instead of an integer for chart_number; (2) delete the two triggers; and (3) the table with unused chart numbers; all in a single transaction.
While what you want is possible, it can't be done using only a SEQUENCE and it requires an exclusive lock on the table, or a retry loop, to work.
You'll need to:
LOCK thetable IN EXCLUSIVE MODE
Find the first free ID by querying for the max id then doing a left join over generate_series to find the first free entry. If there is one.
If there is a free entry, insert it.
If there is no free entry, call nextval and return the result.
Performance will be absolutely horrible, and transactions will be serialized. There'll be no concurrency. Also, unless the LOCK is the first thing you run that affects that table, you'll face deadlocks that cause transaction aborts.
You can make this less bad by using an AFTER DELETE .. FOR EACH ROW trigger that keeps track of entries you delete by INSERTing them into a one-column table that keeps track of spare IDs. You can then SELECT the lowest ID from the table in your ID assignment function on the default for the column, avoiding the need for the explicit table lock, the left join on generate_series and the max call. Transactions will still be serialized on a lock on the free IDs table. In PostgreSQL you can even solve that using SELECT ... FOR UPDATE SKIP LOCKED. So if you're on 9.5 you can actually make this non-awful, though it'll still be slow.
I strongly advise you to just use a SEQUENCE directly, and not bother with re-using values.
Here is a table that has fields id, id_user, order_id.
Required when creating a record to find the last number of user and insert the following in order.
I wrote a stored procedure that takes the next order number to the user, but even it does not provide a unique order number.
CREATE OR REPLACE FUNCTION get_next_order()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $function$
DECLARE
next_order_num bigint;
BEGIN
select order_id + 1 INTO next_order_num
from payment_out
where payment_out.id_usr = NEW.id_usr
and payment_out.order_id is not null
order by payment_out.order_id desc
limit 1;
-- if payments does't exist, return 1
NEW.order_id = coalesce(next_order_num, 1);
return NEW;
END;
$function$
CREATE TRIGGER get_next_order
BEFORE INSERT
ON payment_out
FOR EACH ROW EXECUTE
PROCEDURE get_next_order()
How can I avoid duplicate order numbers?
For this to work in the presence of multiple concurrent transactions inserting orders for the same user, you need a lock on a particular record to make them wait and execute serially.
e.g., before the first SELECT, you might:
PERFORM 1 FROM "users" where id_user = NEW.id_user FOR UPDATE;
where you lock the parent "users" record that owns the orders.
Otherwise, multiple concurrent transactions could execute your procedure at the same time, but they can't see each others' inserted values, so they'll pick the same numbers.
However, beware: A foreign key constraint will cause a SHARE lock to be taken on the users entry already, when you insert into a table that depends on it. Your trigger will try to upgrade that into an UPDATE lock, but multiple transactions might already hold the SHARE lock, so this will block. You'll land up with transactions all waiting for each other, until PostgreSQL kills all but one of them in a deadlock abort error. The only way to avoid this is for the application to SELECT 1 FROM users WHERE id_user = blahblah FOR UPDATE before it creates the orders for that user.
A variant is to keep a next_order_id field in users and do an UPDATE users SET next_order_id = next_order_id + 1 RETURNING next_order_id, and use the result of that to set the order ID. The same lock upgrade problem applies.