Cap on number of table rows matching a certain condition using Postgres exclusion constraint?

Cap on number of table rows matching a certain condition using Postgres exclusion constraint? - postgresql

If I have a Postgresql db schema for a tires table like this (a user has many tires):
user_id integer
description text
size integer
color text
created_at timestamp
and I want to enforce a constraint that says "a user can only have 4 tires".
A naive way to implement this would be to do:
SELECT COUNT(*) FROM tires WHERE user_id='123'
, compare the result to 4 and insert if it's lower than 4. It's suspectible to race conditions and so is a naive approach.
I don't want to add a count column. How can I do this (or can I do this) using an exclusion constraint? If it's not possible with exclusion constraints, what is the canonical way?

The "right" way is using locks here.
Firstly, you need to lock related rows. Secondly, insert new record if a user has less than 4 tires. Here is the SQL:
begin;
-- lock rows
select count(*)
from tires
where user_id = 123
for update;
-- throw an error of there are more than 3
-- insert new row
insert into tires (user_id, description)
values (123, 'tire123');
commit;
You can read more here How to perform conditional insert based on row count?

Related

WHERE "id" = nextval(seq) doesn't work properly

Table: test_seq
id (varchar(8))
raw_data (text)
cd_1
'I'm text'
cd_2
'I'm more text'
CREATE SEQUENCE cd_seq CYCLE START 1 MAXVALUE 2;
ALTER TABLE test_seq
ALTER COLUMN id SET DEFAULT ('cd_'||nextval('cd_seq'))::VARCHAR(8);
UPDATE test_seq
SET raw_data = 'New Text'
WHERE "id" = 'cd_'||nextval('cd_seq')::VARCHAR(8);
I am making a table that will store raw data as a short term backup, if for some reason the data ingestion fails and we need to go back without extracting it again. I'm trying to setup a way to have the records get replace when we have reached the ID limit.
So if I want 25 records in the table, when the SEQUENCE rolls back from the maximum ('cd_25') to ('cd_1'), I want raw_data to get updated to the new data.
I've come up with the SEQUENCE and the DEFAULT value for the first inserts but my UPDATE command won't update the records even when the "id" matches the 'cd_'||nextval('cd_seq') and it will sometimes UPDATE 9 rows at once.
I checked the values of "id" and 'cd_'||nextval('cd_seq') and they appear to be a match but the WHERE doesn't work properly.
Am I missing something or am I overcomplicating things?
Thank you

While I agree with Adrian Klaver's comments that this approach is pretty fragile due to how sequences work, if:
You can make sure the column default value is the only call to the sequence
You don't mind skipped rows if an insert fails, but sequence still increments its value
You can make sure all inserts handle conflicts like below
this can work.
Instead of trying to insert data by updating existing rows - which by the way forces you to prepopulate the table - just actually insert it and handle the conflict.
insert into test_seq
(text_column)
values
('e')
on conflict(id) do update set text_column=excluded.text_column;
This also lets you insert more than one row at once (up to the max size of your table, the length of your sequence), compared to your current update approach, as I do in the test below.
drop sequence if exists cd_seq cascade;
create sequence cd_seq cycle start 1 maxvalue 4;
drop table if exists test_seq cascade;
create table test_seq
(id text primary key default ('cd_'||nextval('cd_seq'))::VARCHAR(8),
text_column text);
insert into test_seq
(text_column)
values
('a'),
('b'),
('c'),
('d')
on conflict(id) do update set text_column=excluded.text_column;
select id, text_column from test_seq;
-- id | text_column
--------+-------------
-- cd_1 | a
-- cd_2 | b
-- cd_3 | c
-- cd_4 | d
--(4 rows)
insert into test_seq
(text_column)
values
('e'),
('f')
on conflict(id) do update set text_column=excluded.text_column;
select id, text_column from test_seq;
-- id | text_column
--------+-------------
-- cd_3 | c
-- cd_4 | d
-- cd_1 | e
-- cd_2 | f
--(4 rows)
If you try to insert more rows than the length of your sequence, you'll get
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same
command have duplicate constrained values.
If in your current solution you gave your update a source table to get multiple rows from and their number also exceeded the sequence length, it wouldn't pose a problem - in conflicting pairs you'd just get the last one. Here's your update, fixed (but still requires that your table is pre-populated):
with new as (
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'g' text_column union all
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'h' text_column union all
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'i' text_column union all
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'j' text_column union all
select ('cd_'||nextval('cd_seq'))::VARCHAR(8) id,'k' text_column)
update test_seq old
set text_column=new.text_column
from new
where old.id=new.id;

Update a very large table in PostgreSQL without locking

I have a very large table with 100M rows in which I want to update a column with a value on the basis of another column. The example query to show what I want to do is given below:
UPDATE mytable SET col2 = 'ABCD'
WHERE col1 is not null
This is a master DB in a live environment with multiple slaves and I want to update it without locking the table or effecting the performance of the live environment. What will be the most effective way to do it? I'm thinking of making a procedure that update rows in batches of 1000 or 10000 rows using something like limit but not quite sure how to do it as I'm not that familiar with Postgres and its pitfalls. Oh and both columns don't have any indexes but table has other columns that has.
I would appreciate a sample procedure code.
Thanks.

There is no update without locking, but you can strive to keep the row locks few and short.
You could simply run batches of this:
UPDATE mytable
SET col2 = 'ABCD'
FROM (SELECT id
FROM mytable
WHERE col1 IS NOT NULL
AND col2 IS DISTINCT FROM 'ABCD'
LIMIT 10000) AS part
WHERE mytable.id = part.id;
Just keep repeating that statement until it modifies less than 10000 rows, then you are done.
Note that mass updates don't lock the table, but of course they lock the updated rows, and the more of them you update, the longer the transaction, and the greater the risk of a deadlock.
To make that performant, an index like this would help:
CREATE INDEX ON mytable (col2) WHERE col1 IS NOT NULL;

Just an off-the-wall, out-of-the-box idea. Both col1 and col2 must be null to qualify precludes using an index, perhaps building a psudo index might be an option. This index would of course be a regular table but would only exist for a short period. Additionally, this relieves the lock time worry.
create table indexer (mytable_id integer primary key);
insert into indexer(mytable_id)
select mytable_id
from mytable
where col1 is null
and col2 is null;
The above creates our 'index' that contains only the qualifying rows. Now wrap an update/delete statement into an SQL function. This function updates the main table and deleted the updated rows from the 'index' and returns the number of rows remaining.
create or replace function set_mytable_col2(rows_to_process_in integer)
returns bigint
language sql
as $$
with idx as
( update mytable
set col2 = 'ABCD'
where col2 is null
and mytable_id in (select mytable_if
from indexer
limit rows_to_process_in
)
returning mytable_id
)
delete from indexer
where mytable_id in (select mytable_id from idx);
select count(*) from indexer;
$$;
When the functions returns 0 all rows initially selected have been processed. At this point repeat the entire process to pickup any rows added or updated which the initial selection didn't identify. Should be small number, and process is still available needed later.
Like I said just an off-the-wall idea.
Edited
Must have read into it something that wasn't there concerning col1. However the idea remains the same, just change the INSERT statement for 'indexer' to meet your requirements. As far as setting it in the 'index' no the 'index' contains a single column - the primary key of the big table (and of itself).
Yes you would need to run multiple times unless you give it the total number rows to process as the parameter. The below is a DO block that would satisfy your condition. It processes 200,000 on each pass. Change that to fit your need.
Do $$
declare
rows_remaining bigint;
begin
loop
rows_remaining = set_mytable_col2(200000);
commit;
exit when rows_remaining = 0;
end loop;
end; $$;

Delete row by row number in postgresql

I am new to postgreSql and I used following query to retrieve all the fields from database.
SELECT student.*,row_number() OVER () as rnum FROM student;
I don't know how to delete particular row by row number.Please give me some idea.
This is my table:
Column | Type
------------+------------------
name | text
rollno | integer
cgpa | double precision
department | text
branch | text

with a as
(
SELECT student.*,row_number() OVER () as rnum FROM student
)
delete from student where ctid in (select ctid from a where rnum =1) -- the
-- row_number you want
-- to delete
Quoted from PostgreSQL - System Columns
ctid :
The physical location of the row version within its table. Note
that although the ctid can be used to locate the row version very
quickly, a row's ctid will change each time it is updated or moved by
VACUUM FULL. Therefore ctid is useless as a long-term row identifier.
The OID, or even better a user-defined serial number, should be used
to identify logical rows.
Note : I strongly recommend you to use an unique filed in student table.
As per Craig's comment, I'll give another way to solve OP's issue it's a bit tricky
First create a unique column for table student, for this use below query
alter table student add column stu_uniq serial
this will produce stu_uniq with corresponding unique values for each row, so that OP can easily DELETE any row(s) using this stu_uniq

I don't know whether its a correct alternative for this problem.But it satisfies my problem.What my problem is I need to delete a row without help of anyone of it's column.I created table with OIDS,and with help of oid I deleted the rows.
CREATE TABLE Student(Name Text,RollNo Integer,Cgpa Float,Department Text,Branch Text)WITH OIDS;
DELETE FROM STUDENT WHERE oid=18789;
DELETE FROM STUDENT WHERE oid=18790;
Quoted from PostgreSQL - System Columns
Thanks to #WingedPanther for suggesting this idea.

You could try like this.
create table t(id int,name varchar(10));
insert into t values(1,'a'),(2,'b'),(3,'c'),(4,'d');
with cte as
(
select *,ROW_NUMBER()over(order by id) as rn from t
)
delete from cte where rn=1;
Cte in Postgres

Compact or renumber IDs for all tables, and reset sequences to max(id)?

After running for a long time, I get more and more holes in the id field. Some tables' id are int32, and the id sequence is reaching its maximum value. Some of the Java sources are read-only, so I cannot simply change the id column type from int32 to long, which would break the API.
I'd like to renumber them all. This may be not good practice, but good or bad is not concerned in this question. I want to renumber, especially, those very long IDs like "61789238", "548273826529524324". I don't know why they are so long, but shorter IDs are also easier to handle manually.
But it's not easy to compact IDs by hand because of references and constraints.
Does PostgreSQL itself support of ID renumbering? Or is there any plugin or maintaining utility for this job?
Maybe I can write some stored procedures? That would be very nice so I can schedule it once a year.

The question is old, but we got a new question from a desperate user on dba.SE after trying to apply what is suggested here. Find an answer with more details and explanation over there:
Compacting a sequence in PostgreSQL
The currently accepted answer will fail for most cases.
Typically, you have a PRIMARY KEY or UNIQUE constraint on an id column, which is NOT DEFERRABLE by default. (OP mentions references and constraints.) Such constraints are checked after each row, so you most likely get unique violation errors trying. Details:
Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Typically, one wants to retain the original order of rows while closing gaps. But the order in which rows are updated is arbitrary, leading to arbitrary numbers. The demonstrated example seems to retain the original sequence because physical storage still coincides with the desired order (inserted rows in desired order just a moment earlier), which is almost never the case in real world applications and completely unreliable.
The matter is more complicated than it might seem at first. One solution (among others) if you can afford to remove the PK / UNIQUE constraint (and related FK constraints) temporarily:
BEGIN;
LOCK tbl;
-- remove all FK constraints to the column
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
-- for the simple case without FK references - or see below:
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
-- Update referencing value in FK columns at the same time (if any)
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
-- add all FK constraints to the column back
COMMIT;
This is also much faster for big tables, because checking PK (and FK) constraint(s) for every row costs a lot more than removing the constraint(s) and adding it (them) back.
If there are FK columns in other tables referencing tbl.id, use data-modifying CTEs to update all of them.
Example for a table fk_tbl and a FK column fk_id:
WITH u1 AS (
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id
RETURNING t.id, t1.new_id -- return old and new ID
)
UPDATE fk_tbl f
SET fk_id = u1.new_id -- set to new ID
FROM u1
WHERE f.fk_id = u1.id; -- match on old ID
More in the referenced answer on dba.SE.

Assuming your ids are generated from a bignum sequence, just RESTART the sequence and update the table with idcolumn = DEFAULT.
CAVEAT: If this id column is used as a foreign key by other tables, make sure you have the on update cascade modifier turned on.
For example:
Create the table, put some data in, and remove a middle value:
db=# create sequence xseq;
CREATE SEQUENCE
db=# create table foo ( id bigint default nextval('xseq') not null, data text );
CREATE TABLE
db=# insert into foo (data) values ('hello'), ('world'), ('how'), ('are'), ('you');
INSERT 0 5
db=# delete from foo where data = 'how';
DELETE 1
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
4 | are
5 | you
(4 rows)
Reset your sequence:
db=# ALTER SEQUENCE xseq RESTART;
ALTER SEQUENCE
Update your data:
db=# update foo set id = DEFAULT;
UPDATE 4
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
3 | are
4 | you
(4 rows)

new id column and Foreign Key(s) while the old ones are still in use. With some (quick) renaming, applications do not have to be aware. (But applications should be inactive during the final renaming step)
\i tmp.sql
-- the test tables
CREATE TABLE one (
id serial NOT NULL PRIMARY KEY
, payload text
);
CREATE TABLE two (
id serial NOT NULL PRIMARY KEY
, the_fk INTEGER REFERENCES one(id)
ON UPDATE CASCADE ON DELETE CASCADE
);
-- And the supporting index for the FK ...
CREATE INDEX ON two(the_fk);
-- populate
INSERT INTO one(payload)
SELECT x::text FROM generate_series(1,1000) x;
INSERT INTO two(the_fk)
SELECT id FROM one WHERE random() < 0.3;
-- make some gaps
DELETE FROM one WHERE id % 13 > 0;
-- SELECT * FROM two;
-- Add new keycolumns to one and two
ALTER TABLE one
ADD COLUMN new_id SERIAL NOT NULL UNIQUE
;
-- UPDATE:
-- This could need DEFERRABLE
-- Note since the update is only a permutation of the
-- existing values, we dont need to reset the sequence.
UPDATE one SET new_id = self.new_id
FROM ( SELECT id, row_number() OVER(ORDER BY id) AS new_id FROM one ) self
WHERE one.id = self.id;
ALTER TABLE two
ADD COLUMN new_fk INTEGER REFERENCES one(new_id)
;
-- update the new FK
UPDATE two t
SET new_fk = o.new_id
FROM one o
WHERE t.the_fk = o.id
;
SELECT * FROM two;
-- The crucial part: the final renaming
-- (at this point it would be better not to allow other sessions
-- messing with the {one,two} tables ...
-- --------------------------------------------------------------
ALTER TABLE one DROP COLUMN id CASCADE;
ALTER TABLE one rename COLUMN new_id TO id;
ALTER TABLE one ADD PRIMARY KEY(id);
ALTER TABLE two DROP COLUMN the_fk CASCADE;
ALTER TABLE two rename COLUMN new_fk TO the_fk;
CREATE INDEX ON two(the_fk);
-- Some checks.
-- (the automatically generated names for the indexes
-- and the sequence still contain the "new" names.)
SELECT * FROM two;
\d one
\d two
UPDATE: added the permutation of new_id (after creating it as a serial)
Funny thing is: it doesn't seem to need 'DEFERRABLE'.

*This script will work for postgresql
This is a generic solution that works for all cases
This query find the desciption of the fields of all tables from any database.
WITH description_bd AS (select colum.schemaname,coalesce(table_name,relname) as table_name , column_name, ordinal_position, column_default, data_type, is_nullable, character_maximum_length, is_updatable,description from
( SELECT columns.table_schema as schemaname,columns.table_name, columns.column_name, columns.ordinal_position, columns.column_default, columns.data_type, columns.is_nullable, columns.character_maximum_length, columns.character_octet_length, columns.is_updatable, columns.udt_name
FROM information_schema.columns
) colum
full join (SELECT schemaname, relid, relname,objoid, objsubid, description
FROM pg_statio_all_tables ,pg_description where pg_statio_all_tables.relid= pg_description.objoid ) descre
on descre.relname = colum.table_name and descre.objsubid=colum.ordinal_position and descre.schemaname=colum.schemaname )
This query propose a solution to fix the sequence of all database tables (this generates a query in the req field which fixes the sequence of the different tables).
It finds the number of records of the table and then increment this number by one.
SELECT table_name, column_name, ordinal_position,column_default,
data_type, is_nullable, character_maximum_length, is_updatable,
description,'SELECT setval('''||schemaname||'.'|| replace(replace(column_default,'''::regclass)',''),'nextval(''','')||''', (select max( '||column_name ||')+1 from '|| table_name ||' ), true);' as req
FROM description_bd where column_default like '%nextva%'

Since I didn't like the answers, I wrote a function in PL/pgSQL to do the job.
It is called like this :
=> SELECT resequence('port','id','port_id_seq');
resequence
--------------
5090 -> 3919
Takes 3 parameters
name of table
name of column that is SERIAL
name of sequence that the SERIAL uses
The function returns a short report of what it has done, with the previous value of the sequence and the new value.
The function LOOPs over the table ORDERed by the named column and makes an UPDATE for each row. Then sets the new value for the sequence. That's it.
The order of the values is preserved.
No ADDing and DROPing of temporary columns or tables involved.
No DROPing and ADDing of constraints and foreign keys needed.
Of course You better have ON UPDATE CASCADE for those foreign keys.
The code :
CREATE OR REPLACE FUNCTION resequence(_tbl TEXT, _clm TEXT, _seq TEXT) RETURNS TEXT AS $FUNC$
DECLARE
_old BIGINT;_new BIGINT := 0;
BEGIN
FOR _old IN EXECUTE 'SELECT '||_clm||' FROM '||_tbl||' ORDER BY '||_clm LOOP
_new=_new+1;
EXECUTE 'UPDATE '||_tbl||' SET '||_clm||'='||_new||' WHERE '||_clm||'='||_old;
END LOOP;
RETURN (nextval(_seq::regclass)-1)||' -> '||setval(_seq::regclass,_new);
END $FUNC$ LANGUAGE plpgsql;

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.

It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.

I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.

If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.

If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum