Every time I do an INSERT or UPSERT (ON CONFLICT UPDATE), the increments column on each table increments by the number of updates that came before it.
For instance, if I have this table:
id int4
title text
description text
updated_at timestamp
created_at timestamp
And then run these queries:
INSERT INTO notifications (title, description) VALUES ('something', 'whatever'); // Generates increments ID=1
UPDATE notifications title='something else' WHERE id = 1; // Repeat this query 20 times with different values.
INSERT INTO notifications (title, description) VALUES ('something more', 'whatever again'); // Generates increments ID=22
This is a pretty big issue. The script we are running processes 100,000+ notifications every day. This can create gaps between each insert on the order of 10,000, so we might start off with 100 rows but by the time we reach 1,000 rows we have an auto-incremented primary key ID value over 100000 for that last row.
We will quickly run out of auto-increment values on our tables if this continues.
Is our PostgreSQL server misconfigured? Using Postgres 9.5.3.
I'm using Eloquent Schema Builder (e.g. $table->increments('id')) to create the table and I don't know if that has something to do with it.
A sequence will be incremented whenever an insertion is attempted regardless of its success. A simple update (as in your example) will not increment it but an insert on conflict update will since the insert is tried before the update.
One solution is to change the id to bigint. Another is not to use a sequence and manage it yourself. And another is to do a manual upsert:
with s as (
select id
from notifications
where title = 'something'
), i as (
insert into notifications (title, description)
select 'something', 'whatever'
where not exists (select 1 from s)
)
update notifications
set title = 'something else'
where id = (select id from s)
This supposes title is unique.
You can reset auto increment column to max inserted value by run this command before insert command:
SELECT setval('notifications_id_seq', MAX(id)) FROM notifications;
Related
Table: test_seq
id (varchar(8))
raw_data (text)
cd_1
'I'm text'
cd_2
'I'm more text'
CREATE SEQUENCE cd_seq CYCLE START 1 MAXVALUE 2;
ALTER TABLE test_seq
ALTER COLUMN id SET DEFAULT ('cd_'||nextval('cd_seq'))::VARCHAR(8);
UPDATE test_seq
SET raw_data = 'New Text'
WHERE "id" = 'cd_'||nextval('cd_seq')::VARCHAR(8);
I am making a table that will store raw data as a short term backup, if for some reason the data ingestion fails and we need to go back without extracting it again. I'm trying to setup a way to have the records get replace when we have reached the ID limit.
So if I want 25 records in the table, when the SEQUENCE rolls back from the maximum ('cd_25') to ('cd_1'), I want raw_data to get updated to the new data.
I've come up with the SEQUENCE and the DEFAULT value for the first inserts but my UPDATE command won't update the records even when the "id" matches the 'cd_'||nextval('cd_seq') and it will sometimes UPDATE 9 rows at once.
I checked the values of "id" and 'cd_'||nextval('cd_seq') and they appear to be a match but the WHERE doesn't work properly.
Am I missing something or am I overcomplicating things?
Thank you
While I agree with Adrian Klaver's comments that this approach is pretty fragile due to how sequences work, if:
You can make sure the column default value is the only call to the sequence
You don't mind skipped rows if an insert fails, but sequence still increments its value
You can make sure all inserts handle conflicts like below
this can work.
Instead of trying to insert data by updating existing rows - which by the way forces you to prepopulate the table - just actually insert it and handle the conflict.
insert into test_seq
(text_column)
values
('e')
on conflict(id) do update set text_column=excluded.text_column;
This also lets you insert more than one row at once (up to the max size of your table, the length of your sequence), compared to your current update approach, as I do in the test below.
drop sequence if exists cd_seq cascade;
create sequence cd_seq cycle start 1 maxvalue 4;
drop table if exists test_seq cascade;
create table test_seq
(id text primary key default ('cd_'||nextval('cd_seq'))::VARCHAR(8),
text_column text);
insert into test_seq
(text_column)
values
('a'),
('b'),
('c'),
('d')
on conflict(id) do update set text_column=excluded.text_column;
select id, text_column from test_seq;
-- id | text_column
--------+-------------
-- cd_1 | a
-- cd_2 | b
-- cd_3 | c
-- cd_4 | d
--(4 rows)
insert into test_seq
(text_column)
values
('e'),
('f')
on conflict(id) do update set text_column=excluded.text_column;
select id, text_column from test_seq;
-- id | text_column
--------+-------------
-- cd_3 | c
-- cd_4 | d
-- cd_1 | e
-- cd_2 | f
--(4 rows)
If you try to insert more rows than the length of your sequence, you'll get
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same
command have duplicate constrained values.
If in your current solution you gave your update a source table to get multiple rows from and their number also exceeded the sequence length, it wouldn't pose a problem - in conflicting pairs you'd just get the last one. Here's your update, fixed (but still requires that your table is pre-populated):
with new as (
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'g' text_column union all
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'h' text_column union all
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'i' text_column union all
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'j' text_column union all
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'k' text_column)
update test_seq old
set text_column=new.text_column
from new
where old.id=new.id;
I execute several processes to continuously (update + select) data from postgresql (with autocommit option enabled), and get data which should already be filtered out by executing previous update done by other processes.
This is essentially a queue, with all rows initially having status=0, and selected row-by-row in some complicated (not shown) order. Each pending row is set status=1, pid=(process id) and uniqueid=(some iterating id), but different processes for some reason share same rows.
Oversimplified (but still incorrectly working) example is below:
Table definition:
create table Todq (
id bigserial primary key,
url varchar(1024),
status integer not null,
pid integer,uniqueid bigint,
priority integer
);
create index Iodq1 on Todq (status,priority,id);
create index Iodq2 on Todq (pid,uniqueid);
Update (gets next element in queue, sets process pid and iterating uniqueid):
update Todq odq2
set status=1,pid=?,uniqueid=?
from (
select odq.id
from Todq odq
where odq.status=0
order by odq.priority desc,odq.id
limit 1
) odq1
where odq2.id=odq1.id
Select (selects by process id and incrementable uniqueid):
select odq.id,odq.url,odq.pid,odq.uniqueid
from Todq odq
where odq.status=1 and pid=? and uniqueid=?
I made a test bench in perl, which writes values selected to logs/$$.log and as a result different logs share same entries (though with different pid and uniqueid).
The answer is to use (select .. for update) like:
update Todq odq2
set status=1,pid=?,uniqueid=?
from (
select odq.id
from Todq odq
where odq.status=0
order by odq.priority desc,odq.id
limit 1
for update
) odq1
where odq2.id=odq1.id
I have a situation where I have multiple (potentially hundreds) threads repeating the same task (using a java scheduled executor, if you are curious). This task entails selecting rows of changes (from a table called change) that have not yet been processed (processed changes are kept track in a m:n join table called process_change_rel that keeps track of the process id, record id and status) processing them, then updating back the status.
My question is, how is the best way to prevent two threads from the same process from selecting the same row? Will the below solution (using for update to lock rows ) work? If not, please suggest a working solution
Create table change(
—id , autogenerated pk
—other fields
)
Create table change_process_rel(
—change id (pk of change table)
—process id (pk of process table)
—status)
Query I would use is listed below
Select * from
change c
where c.id not in(select changeid from change_process_rel with cs) for update
Please let me know if this would work
You have to "lock" a row which you are going to process somehow. Such a "locking" should be concurrent of course with minimum conflicts / errors.
One way is as follows:
Create table change
(
id int not null generated always as identity
, v varchar(10)
) in userspace1;
insert into change (v) values '1', '2', '3';
Create table change_process_rel
(
id int not null
, pid int not null
, status int not null
) in userspace1;
create unique index change_process_rel1 on change_process_rel(id);
Now you should be able to run the same statement from multiple concurrent sessions:
SELECT ID
FROM NEW TABLE
(
insert into change_process_rel (id, pid, status)
select c.id, mon_get_application_handle(), 1
from change c
where not exists (select 1 from change_process_rel r where r.id = c.id)
fetch first 1 row only
with ur
);
Every such a statement inserts 1 or 0 rows into the change_process_rel table, which is used here as a "lock" table. The corresponding ID from change is returned, and you may proceed with processing of the corresponding event in the same transaction.
If the transaction completes successfully, then the row inserted into the change_process_rel table is saved, so, the corresponding id from change may be considered as processed. If the transaction fails, the corresponding "lock" row from change_process_rel disappears, and this row may be processed later by this or another application.
The problem of this method is, that when both tables become large enough, such a sub-select may not work as quick as previously.
Another method is to use Evaluate uncommitted data through lock deferral.
It requires to place the status column into the change table.
Unfortunately, Db2 for LUW doesn't have SKIP LOCKED functionality, which might help with such a sort of algorithms.
If, let's say, status=0 is "not processed", and status<>0 is some processing / processed status, then after setting these DB2_EVALUNCOMMITTED and DB2_SKIP* registry variables and restart the instance, you may "catch" the next ID for processing with the following statement.
SELECT ID
FROM NEW TABLE
(
update
(
select id, status
from change
where status=0
fetch first 1 row only
)
set status=1
);
Once you get it, you may do further processing of this ID in the same transaction as previously.
It's good to create an index for performance:
create index change1 on change(status);
and may be set this table as volatile or collect distribution statistics on this column in addition to regular statistics on table and its indexes periodically.
Note that such a registry variables setting has global effect, and you should keep it in mind...
I got this trigger as a tip and I would like to know how it works with updates. It is supposed to create a record everytime there is an update or insert action on my main table.
create trigger tblTriggerAuditRecord on tblOrders
after **update, insert**
as
begin
insert into tblOrdersAudit
(OrderID, OrderApprovalDateTime, OrderStatus, UpdatedBy, UpdatedOn )
select i.OrderID, i.OrderApprovalDateTime, i.OrderStatus, SUSER_SNAME(), getdate()
from tblOrders t
inner join **inserted** i on t.OrderID=i.OrderID
end
go
From my understanding, it inserts all the inserted records to the main table to the stated columns in the audit including timestamp and user but how about the update? What if I update the rows in my main table? should not I have a joing also on the updated records?
Hope my question is clear, thanks a lot for help!
There is no table updated when a trigger is fired. In case of an update, the old values from your main table you'll find in a table deleted and the new ones are (like in case of an insert) in the table inserted.
That's the same as in this example:
UPDATE tabEmployee SET Salary = Salary * 1.05
OUTPUT inserted.EmployeeName, deleted.Salary, inserted.Salary
INTO tabSalaryHistory (EmployeeName, OldSalary, NewSalary)
In this example, every employee gets a salary increase. The value before the increase is stored in the output table deleted and the new value in inserted.
Have a look at this for better understanding.
I need to automatically insert a row in a stats table that is identified by the month number, if the new month does not exist as a row.
'cards' is a running count of individual IDs that stores a current value (gets reset at rollover time), a rollover count and a running total of all events on that ID
'stats keeps a running count of all IDs events, and how many rollovers occurred in a given month.
CREATE TABLE IDS (ID_Num VARCHAR(30), Curr_Count INT, Rollover_Count INT, Total_Count INT);
CREATE TABLE stats(Month char(10), HitCount int, RolloverCount int);
CREATE TRIGGER update_Tstats BEFORE UPDATE OF Total_Count ON IDS
WHEN 0=(SELECT HitCount from stats WHERE Month = strftime('%m','now'))
(Also tried a "IS NULL" at the other end of the WHEN clause...still no joy)
BEGIN
INSERT INTO stats (Month, HitCount, RolloverCount) VALUES (strftime('%m', 'now'),0,0);
END;
I did have it working to a point, but as rollover was updated twice per cycle (value changed up and down via SQL query I have in a python script), it gave me doubleups in the stats rollover count. So now I'm running a double query in my script. However, this all fall over if the current month number does not exist in the stats table.
All I need to do is check if a blank record exists for the current month for the python script UPDATE queries to run against, and if not, INSERT one. The script itself can't do a 'run once' type of query on initial runup, because it may run for days, including spanning a new month changeover.
Any assistance would be hugely appreciated.
To check whether a record exists, use EXISTS:
CREATE TRIGGER ...
WHEN NOT EXISTS (SELECT 1 FROM stats WHERE Month = ...)
BEGIN
INSERT INTO stats ...
END;