I'm trying to write a rule on a view to delete tuples from the component tables, but so far can only remove data from one of them. I've used postgres with basic views for a while, but I don't have any experience with rules on views.
I wrote a stupid little test case to figure out/show my problem. There's only one parent tuple per child tuple in this example (my actual schema isn't actually like this of course).
Component tables:
CREATE TABLE parent(
id serial PRIMARY KEY,
p_data integer NOT NULL UNIQUE
);
CREATE TABLE child(
id serial PRIMARY KEY,
parent_id integer NOT NULL UNIQUE REFERENCES parent(id),
c_data integer NOT NULL
);
View:
CREATE TABLE child_view(
id integer,
p_data integer,
c_data integer
);
CREATE RULE "_RETURN" AS ON SELECT TO child_view DO INSTEAD
SELECT child.id, p_data, c_data
FROM parent JOIN child ON (parent_id=parent.id);
Problem delete rule
CREATE RULE child_delete AS ON DELETE TO child_view DO INSTEAD(
DELETE FROM child WHERE id=OLD.id;
DELETE FROM parent WHERE p_data=OLD.p_data;
);
The intent of the above rule is to remove tuples referenced in the view from the component tables. The WHERE p_data=OLD.p_data seems odd to me, but I don't see how else to reference the desired tuple in the parent table.
Here's what happens when I try to use the above rule:
>SELECT * FROM child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 10
2 | 2 | 11
3 | 3 | 12
(3 rows)
>DELETE FROM child_view WHERE id=3;
DELETE 0
>SELECT * FROM child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 10
2 | 2 | 11
(2 rows)
But looking at the parent table, the second part of the delete isn't working (id=3 "should" have been deleted):
>SELECT * FROM parent;
id | p_data
----+--------
1 | 1
2 | 2
3 | 3
(3 rows)
How should I write the deletion rule to remove both child and parent tuples?
This is using postgres v9.
Any help is appreciated. Also pointers to any materials covering rules on views beyond the postgres docs (unless I've obviously missed something) would also be appreciated. Thanks.
EDIT: as jmz points out, it would be easier to use a cascading delete than a rule here, but that approach doesn't work for my actual schema.
What you're seeing with the rule problem is that the rule system doesn't handle the data atomically. The first delete is executed regardless of the order of the two statements in the DO INSTEAD rule. The second statement is never executed, because the row to which OLD.id refers to has been removed from the view. You could use a LEFT JOIN, but that won't help you because of the example table design (it may work on your actual database schema).
The fundamental problem, as I see it, is that you're treating the rule system as it was a trigger.
Your best option is to use foreign keys and ON DELETE CASCADE instead of rules. With them your example schema would work too: You'd only need on delete for the parent table to get rid of all the children.
What you want to do will work fine. But you made a left turn on this:
CREATE TABLE child_view(
id integer,
p_data integer,
c_data integer
);
CREATE RULE "_RETURN" AS ON SELECT TO child_view DO INSTEAD
SELECT child.id, p_data, c_data
FROM parent JOIN child ON (parent_id=parent.id);
You want a real life view here not a table. That is why delete will not work.
CREATE VIEW child_view AS SELECT
child.id,
p_data,
c_data
FROM parent
JOIN child ON (parent_id=parent.id)
;
Replace the top with the bottom and it will work perfectly (It did when I tested it). The reason delete does not work is it trying to delete id from the TABLE child view which is of course empty! It does not execute the 'select do instead' rule so it is working on the real table child view. People may poo-poo using rules but if they cannot see such an obvious mistake I wonder how much they know?
I have used rules successfully in defining interfaces to enforce business rules. They can lead elegant solutions in ways triggers could not.
Note: I only recommend this to make writable views for an interface. You could do clever things like checking constraints across tables - and you may be asking for it. That kind stuff really should be used with triggers.
Edit: script per request
-- set this as you may have had an error if you running
-- from a script and not noticed it with all the NOTICES
\set ON_ERROR_STOP
drop table if exists parent cascade;
drop table if exists child cascade;
CREATE TABLE parent(
id serial PRIMARY KEY,
p_data integer NOT NULL UNIQUE
);
CREATE TABLE child(
id serial PRIMARY KEY,
parent_id integer NOT NULL UNIQUE REFERENCES parent(id),
c_data integer NOT NULL
);
CREATE VIEW child_view AS SELECT
child.id,
p_data,
c_data
FROM parent
JOIN child ON (parent_id=parent.id)
;
CREATE RULE child_delete AS ON DELETE TO child_view DO INSTEAD(
DELETE FROM child WHERE id=OLD.id;
DELETE FROM parent WHERE p_data=OLD.p_data;
);
insert into parent (p_data) values (1), (2), (3);
insert into child (parent_id, c_data) values (1, 1), (2, 2), (3, 3);
select * from child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
(3 rows)
delete from child_view where id=3;
DELETE 0
select * from child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 1
2 | 2 | 2
(2 rows)
Related
I have table:
book_id | part | name
1 | 1 | chap 1
1 | 2 | chap 2
1 | 3 | chap 3
1 | 4 | chap 4
Primary key is book_id and part.
How can delete part 2 and update order of parts to get:
book_id | part | name
1 | 1 | chap 1
1 | 2 | chap 3
1 | 3 | chap 4
I can do a transaction and firstly delete part 2, but how can i then update part column without getting duplicate primary key error?
I would choose a different approach. Instead of persisting the part number, persist the order of the parts:
CREATE TABLE book_part (
book_id bigint NOT NULL,
part_order real NOT NULL,
name text NOT NULL,
PRIMARY KEY (book_id, part_order)
);
The first part that gets entered gets a part_order of 0.0. If you add a part at the beginning or the end, you just assign to part_order 1.0 less or more than the previous minimum or maximum. If you insert a part between two existing parts, you assign a part_order that is the arithmetic mean of the adjacent parts.
An example:
-- insert the first part
INSERT INTO book_part VALUES (1, 0.0, 'Introduction');
-- insert a part at the end
INSERT INTO book_part VALUES (1, 1.0, 'Getting started with PostgreSQL');
-- insert a part between the two existing parts
INSERT INTO book_part VALUES (1, 0.5, 'The history of PostgreSQL');
-- adding yet another part between two existing parts
INSERT INTO book_part VALUES (1, 0.25, 'An introductory example');
The actual part number is calculated when you query the table:
SELECT book_id,
row_number() OVER (PARTITION BY book_id ORDER BY part_order) AS part,
name
FROM book_part;
The beauty of that is that you don't need to update a lot of rows when you add or delete a part.
Unlike most RDBMS, PostGreSQL does not support updating a primary key that might violate a preexisting value without having to use a deferred constraint.
In fact PostGreSQL execute the update row by row which conducts to find a "phantom" duplicate key, while other RDBMS that respects the standard uses a set based approach (MS SQL Server, Oracle, DB2...)
So you must use a deferred constraint.
ALTER TABLE book_part
ALTER CONSTRAINT ??? *PK constraint name* ??? DEFERRABLE INITIALLY IMMEDIATE;
This is a severe limitations of PG... See "5 – The hard way to udpates unique values" in
http://mssqlserver.fr/postgresql-vs-sql-server-mssql-part-3-very-extremely-detailed-comparison/
As school work we're supposed to create a table that logs all operations done by users on another table. To be more clear, say I have table1 and logtable, table1 can contain any info (names, ids, job, etc), logtable contains info on who did what, when on table1. Using a function and a trigger I managed to get the INSERT, DELETE and UPDATE operations to be a logged in logtable, but we're also supposed to keep a log of SELECTs. To be more specific about the SELECTs, in a View if you do a SELECT, this is supposed to be logged into logtable via an INSERT, essentially the logtable is supposed to have a new row with information telling that somebody did a SELECT. My problem is that I can't figure out any way to accomplish this as SELECTs can't make use of triggers and in turn can't make use of functions, and rules don't allow for two different operations to take place. The only thing that came close was using query logs, however as the database is the school's and not mine I can't make any use of them.
Here is a rough example of what I'm working with (in reality tstamp has hours minutes and such):
id operation hid tablename who tstamp val_new val_old
x INSERT x table1 name YYYY-MM-DD newValues previousValues
That works as intended, but what I also need to get to work is this (Note: Whether val_new and old come out as empty or not in this case is not a concern):
id operation hid tablename who tstamp val_new val_old
x SELECT x table1 name YYYY-MM-DD NULL previousValues
Any and all help is appreciated.
Here is an example:
CREATE TABLE public.test (id integer PRIMARY KEY, value integer);
INSERT INTO test VALUES (1,42),(2,13);
CREATE TABLE test_log(id serial primary key, dbuser varchar,datetime timestamp);
-- get_test() inserts username / timestamp into log, then returns all rows
-- of test
CREATE OR REPLACE FUNCTION get_test() RETURNS SETOF test AS '
INSERT INTO test_log (dbuser,datetime)VALUES(current_user,now());
SELECT * FROM test;'
language 'sql';
-- now a view returns the full row set of test by instead calling our function
CREATE VIEW test_v AS SELECT * FROM get_test();
SELECT * FROM test_v;
id | value
----+-------
1 | 42
2 | 13
(2 rows)
SELECT * FROM test_log;
id | dbuser | datetime
----+----------+----------------------------
1 | postgres | 2020-11-30 12:42:00.188341
(1 row)
If your table has many rows and/or the selects are complex, you don't want to use this view for performance reasons.
I’m querying from a table that has repeated uuids, and I want to remove duplicates. I also want to exclude some irrelevant data which requires joining on another table. I can remove duplicates and then exclude irrelevant data, or I can switch the order and exclude then remove duplicates. Intuitively, I feel like if anything, removing duplicates then joining should produce more rows than joining and then removing duplicates, but that is the opposite of what I’m seeing. What am I missing here?
In this one, I remove duplicates in the first subquery and filter in the second, and I get 500k rows:
with tbl1 as (
select distinct on (uuid) uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
)
select * from tbl2
If I filter then remove duplicates, I get 550k rows:
with tbl1 as (
select uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
),
tbl3 as (
select distinct on (uuid) uuid
from tbl2
)
select * from tbl3
Is there an explanation here?
Does original_data.foreign_key have a foreign key constraint referencing other_data.id allowing for foreign_keys that don't link to any id in other_data?
Isn't other_data.category or original_data.foreign_key column missing a NOT NULL constraint?
In either of these cases postgres would filter out all records with
a missing link (foreign_key=null)
a broken link (foregin_key doesn't match any id in other_data)
linking to an other_data record with a category set o null
in both of your approaches - regardless of whether they're a duplicate or not - as other_data.category <> something evaluates to null for them which does not satisfy the WHERE clause. That, combined with missing ORDER BY causing DISTINCT ON to drop different duplicates randomly each time, could result in dropping the duplicates that then get filtered out in tbl2 in the first approach, but not in the second.
Example:
pgsql122=# select * from original_data;
uuid | foreign_key | comment
------+-------------+---------------------------------------------------
1 | 1 | correct, non-duplicate record with a correct link
3 | 2 | duplicate record with a broken link
3 | 1 | duplicate record with a correct link
4 | null | duplicate record with a missing link
4 | 1 | duplicate record with a correct link
5 | 3 | duplicate record with a correct link, but a null category behind it
5 | 1 | duplicate record with a correct link
6 | null | correct, non-duplicate record with a missing link
7 | 2 | correct, non-duplicate record with a broken link
8 | 3 | correct, non-duplicate record with a correct link, but a null category behind it
pgsql122=# select * from other_data;
id | category
----+----------
1 | a
3 | null
Both of your approaches keep uuid 1 and eliminate uuid 6, 7 and 8 even though they're unique.
Your first approach randomly keeps between 0 and 3 out of the 3 pairs of duplicates (uuid 3, 4 and 5), depending on which one in each pair gets discarded by DISTINCT ON.
Your second approach always keeps one record for each uuid 3, 4 and 5. Each clone with missing link, a broken link or a link with a null category behind it is already gone by the time you discard duplicates.
As #a_horse_with_no_name suggested, ORDER BY should make DISTINCT ON consistent and predictable but only as long as records vary on the columns used for ordering. It also won't help if you have other issues, like the one I suggest.
After running for a long time, I get more and more holes in the id field. Some tables' id are int32, and the id sequence is reaching its maximum value. Some of the Java sources are read-only, so I cannot simply change the id column type from int32 to long, which would break the API.
I'd like to renumber them all. This may be not good practice, but good or bad is not concerned in this question. I want to renumber, especially, those very long IDs like "61789238", "548273826529524324". I don't know why they are so long, but shorter IDs are also easier to handle manually.
But it's not easy to compact IDs by hand because of references and constraints.
Does PostgreSQL itself support of ID renumbering? Or is there any plugin or maintaining utility for this job?
Maybe I can write some stored procedures? That would be very nice so I can schedule it once a year.
The question is old, but we got a new question from a desperate user on dba.SE after trying to apply what is suggested here. Find an answer with more details and explanation over there:
Compacting a sequence in PostgreSQL
The currently accepted answer will fail for most cases.
Typically, you have a PRIMARY KEY or UNIQUE constraint on an id column, which is NOT DEFERRABLE by default. (OP mentions references and constraints.) Such constraints are checked after each row, so you most likely get unique violation errors trying. Details:
Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Typically, one wants to retain the original order of rows while closing gaps. But the order in which rows are updated is arbitrary, leading to arbitrary numbers. The demonstrated example seems to retain the original sequence because physical storage still coincides with the desired order (inserted rows in desired order just a moment earlier), which is almost never the case in real world applications and completely unreliable.
The matter is more complicated than it might seem at first. One solution (among others) if you can afford to remove the PK / UNIQUE constraint (and related FK constraints) temporarily:
BEGIN;
LOCK tbl;
-- remove all FK constraints to the column
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
-- for the simple case without FK references - or see below:
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
-- Update referencing value in FK columns at the same time (if any)
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
-- add all FK constraints to the column back
COMMIT;
This is also much faster for big tables, because checking PK (and FK) constraint(s) for every row costs a lot more than removing the constraint(s) and adding it (them) back.
If there are FK columns in other tables referencing tbl.id, use data-modifying CTEs to update all of them.
Example for a table fk_tbl and a FK column fk_id:
WITH u1 AS (
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id
RETURNING t.id, t1.new_id -- return old and new ID
)
UPDATE fk_tbl f
SET fk_id = u1.new_id -- set to new ID
FROM u1
WHERE f.fk_id = u1.id; -- match on old ID
More in the referenced answer on dba.SE.
Assuming your ids are generated from a bignum sequence, just RESTART the sequence and update the table with idcolumn = DEFAULT.
CAVEAT: If this id column is used as a foreign key by other tables, make sure you have the on update cascade modifier turned on.
For example:
Create the table, put some data in, and remove a middle value:
db=# create sequence xseq;
CREATE SEQUENCE
db=# create table foo ( id bigint default nextval('xseq') not null, data text );
CREATE TABLE
db=# insert into foo (data) values ('hello'), ('world'), ('how'), ('are'), ('you');
INSERT 0 5
db=# delete from foo where data = 'how';
DELETE 1
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
4 | are
5 | you
(4 rows)
Reset your sequence:
db=# ALTER SEQUENCE xseq RESTART;
ALTER SEQUENCE
Update your data:
db=# update foo set id = DEFAULT;
UPDATE 4
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
3 | are
4 | you
(4 rows)
new id column and Foreign Key(s) while the old ones are still in use. With some (quick) renaming, applications do not have to be aware. (But applications should be inactive during the final renaming step)
\i tmp.sql
-- the test tables
CREATE TABLE one (
id serial NOT NULL PRIMARY KEY
, payload text
);
CREATE TABLE two (
id serial NOT NULL PRIMARY KEY
, the_fk INTEGER REFERENCES one(id)
ON UPDATE CASCADE ON DELETE CASCADE
);
-- And the supporting index for the FK ...
CREATE INDEX ON two(the_fk);
-- populate
INSERT INTO one(payload)
SELECT x::text FROM generate_series(1,1000) x;
INSERT INTO two(the_fk)
SELECT id FROM one WHERE random() < 0.3;
-- make some gaps
DELETE FROM one WHERE id % 13 > 0;
-- SELECT * FROM two;
-- Add new keycolumns to one and two
ALTER TABLE one
ADD COLUMN new_id SERIAL NOT NULL UNIQUE
;
-- UPDATE:
-- This could need DEFERRABLE
-- Note since the update is only a permutation of the
-- existing values, we dont need to reset the sequence.
UPDATE one SET new_id = self.new_id
FROM ( SELECT id, row_number() OVER(ORDER BY id) AS new_id FROM one ) self
WHERE one.id = self.id;
ALTER TABLE two
ADD COLUMN new_fk INTEGER REFERENCES one(new_id)
;
-- update the new FK
UPDATE two t
SET new_fk = o.new_id
FROM one o
WHERE t.the_fk = o.id
;
SELECT * FROM two;
-- The crucial part: the final renaming
-- (at this point it would be better not to allow other sessions
-- messing with the {one,two} tables ...
-- --------------------------------------------------------------
ALTER TABLE one DROP COLUMN id CASCADE;
ALTER TABLE one rename COLUMN new_id TO id;
ALTER TABLE one ADD PRIMARY KEY(id);
ALTER TABLE two DROP COLUMN the_fk CASCADE;
ALTER TABLE two rename COLUMN new_fk TO the_fk;
CREATE INDEX ON two(the_fk);
-- Some checks.
-- (the automatically generated names for the indexes
-- and the sequence still contain the "new" names.)
SELECT * FROM two;
\d one
\d two
UPDATE: added the permutation of new_id (after creating it as a serial)
Funny thing is: it doesn't seem to need 'DEFERRABLE'.
*This script will work for postgresql
This is a generic solution that works for all cases
This query find the desciption of the fields of all tables from any database.
WITH description_bd AS (select colum.schemaname,coalesce(table_name,relname) as table_name , column_name, ordinal_position, column_default, data_type, is_nullable, character_maximum_length, is_updatable,description from
( SELECT columns.table_schema as schemaname,columns.table_name, columns.column_name, columns.ordinal_position, columns.column_default, columns.data_type, columns.is_nullable, columns.character_maximum_length, columns.character_octet_length, columns.is_updatable, columns.udt_name
FROM information_schema.columns
) colum
full join (SELECT schemaname, relid, relname,objoid, objsubid, description
FROM pg_statio_all_tables ,pg_description where pg_statio_all_tables.relid= pg_description.objoid ) descre
on descre.relname = colum.table_name and descre.objsubid=colum.ordinal_position and descre.schemaname=colum.schemaname )
This query propose a solution to fix the sequence of all database tables (this generates a query in the req field which fixes the sequence of the different tables).
It finds the number of records of the table and then increment this number by one.
SELECT table_name, column_name, ordinal_position,column_default,
data_type, is_nullable, character_maximum_length, is_updatable,
description,'SELECT setval('''||schemaname||'.'|| replace(replace(column_default,'''::regclass)',''),'nextval(''','')||''', (select max( '||column_name ||')+1 from '|| table_name ||' ), true);' as req
FROM description_bd where column_default like '%nextva%'
Since I didn't like the answers, I wrote a function in PL/pgSQL to do the job.
It is called like this :
=> SELECT resequence('port','id','port_id_seq');
resequence
--------------
5090 -> 3919
Takes 3 parameters
name of table
name of column that is SERIAL
name of sequence that the SERIAL uses
The function returns a short report of what it has done, with the previous value of the sequence and the new value.
The function LOOPs over the table ORDERed by the named column and makes an UPDATE for each row. Then sets the new value for the sequence. That's it.
The order of the values is preserved.
No ADDing and DROPing of temporary columns or tables involved.
No DROPing and ADDing of constraints and foreign keys needed.
Of course You better have ON UPDATE CASCADE for those foreign keys.
The code :
CREATE OR REPLACE FUNCTION resequence(_tbl TEXT, _clm TEXT, _seq TEXT) RETURNS TEXT AS $FUNC$
DECLARE
_old BIGINT;_new BIGINT := 0;
BEGIN
FOR _old IN EXECUTE 'SELECT '||_clm||' FROM '||_tbl||' ORDER BY '||_clm LOOP
_new=_new+1;
EXECUTE 'UPDATE '||_tbl||' SET '||_clm||'='||_new||' WHERE '||_clm||'='||_old;
END LOOP;
RETURN (nextval(_seq::regclass)-1)||' -> '||setval(_seq::regclass,_new);
END $FUNC$ LANGUAGE plpgsql;
There is a table:
CREATE TABLE temp
(
IDR decimal(9) NOT NULL,
IDS decimal(9) NOT NULL,
DT date NOT NULL,
VAL decimal(10) NOT NULL,
AFFID decimal(9),
CONSTRAINT PKtemp PRIMARY KEY (IDR,IDS,DT)
)
;
Let's see the plan for select star query:
SQL>explain plan for select * from temp;
Explained.
SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)|
| 1 | TABLE ACCESS FULL| TEMP | 1 | 61 | 2 (0)|
---------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
11 rows selected.
SQL server 2008 shows in the same situation Clustered index scan. What is the reason?
select * with no where clause -- means read every row in the table, fetch every column.
What do you gain by using an index? You have to go to the index, get a rowid, translate the rowid into a table offset, read the file.
What happens when you do a full table scan? You go the th first rowid in the table, then read on through the table to the end.
Which one of these is faster given the table you have above? Full table scan. Why? because it skips having to to go the index, retreive values, then going back to the other to where the table lives and fetching.
To answer this more simply without mumbo-jumbo, the reason is:
Clustered Index = Table
That's by definition in SQL Server. If this is not clear, look up the definition.
To be absolutely clear once again, since most people seem to miss this, the Clustered Index IS the table itself. It therefore follows that "Clustered Index Scan" is another way of saying "Table Scan". Or what Oracle calls "TABLE ACCESS FULL"