I have table:
book_id | part | name
1 | 1 | chap 1
1 | 2 | chap 2
1 | 3 | chap 3
1 | 4 | chap 4
Primary key is book_id and part.
How can delete part 2 and update order of parts to get:
book_id | part | name
1 | 1 | chap 1
1 | 2 | chap 3
1 | 3 | chap 4
I can do a transaction and firstly delete part 2, but how can i then update part column without getting duplicate primary key error?
I would choose a different approach. Instead of persisting the part number, persist the order of the parts:
CREATE TABLE book_part (
book_id bigint NOT NULL,
part_order real NOT NULL,
name text NOT NULL,
PRIMARY KEY (book_id, part_order)
);
The first part that gets entered gets a part_order of 0.0. If you add a part at the beginning or the end, you just assign to part_order 1.0 less or more than the previous minimum or maximum. If you insert a part between two existing parts, you assign a part_order that is the arithmetic mean of the adjacent parts.
An example:
-- insert the first part
INSERT INTO book_part VALUES (1, 0.0, 'Introduction');
-- insert a part at the end
INSERT INTO book_part VALUES (1, 1.0, 'Getting started with PostgreSQL');
-- insert a part between the two existing parts
INSERT INTO book_part VALUES (1, 0.5, 'The history of PostgreSQL');
-- adding yet another part between two existing parts
INSERT INTO book_part VALUES (1, 0.25, 'An introductory example');
The actual part number is calculated when you query the table:
SELECT book_id,
row_number() OVER (PARTITION BY book_id ORDER BY part_order) AS part,
name
FROM book_part;
The beauty of that is that you don't need to update a lot of rows when you add or delete a part.
Unlike most RDBMS, PostGreSQL does not support updating a primary key that might violate a preexisting value without having to use a deferred constraint.
In fact PostGreSQL execute the update row by row which conducts to find a "phantom" duplicate key, while other RDBMS that respects the standard uses a set based approach (MS SQL Server, Oracle, DB2...)
So you must use a deferred constraint.
ALTER TABLE book_part
ALTER CONSTRAINT ??? *PK constraint name* ??? DEFERRABLE INITIALLY IMMEDIATE;
This is a severe limitations of PG... See "5 – The hard way to udpates unique values" in
http://mssqlserver.fr/postgresql-vs-sql-server-mssql-part-3-very-extremely-detailed-comparison/
Related
I’m querying from a table that has repeated uuids, and I want to remove duplicates. I also want to exclude some irrelevant data which requires joining on another table. I can remove duplicates and then exclude irrelevant data, or I can switch the order and exclude then remove duplicates. Intuitively, I feel like if anything, removing duplicates then joining should produce more rows than joining and then removing duplicates, but that is the opposite of what I’m seeing. What am I missing here?
In this one, I remove duplicates in the first subquery and filter in the second, and I get 500k rows:
with tbl1 as (
select distinct on (uuid) uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
)
select * from tbl2
If I filter then remove duplicates, I get 550k rows:
with tbl1 as (
select uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
),
tbl3 as (
select distinct on (uuid) uuid
from tbl2
)
select * from tbl3
Is there an explanation here?
Does original_data.foreign_key have a foreign key constraint referencing other_data.id allowing for foreign_keys that don't link to any id in other_data?
Isn't other_data.category or original_data.foreign_key column missing a NOT NULL constraint?
In either of these cases postgres would filter out all records with
a missing link (foreign_key=null)
a broken link (foregin_key doesn't match any id in other_data)
linking to an other_data record with a category set o null
in both of your approaches - regardless of whether they're a duplicate or not - as other_data.category <> something evaluates to null for them which does not satisfy the WHERE clause. That, combined with missing ORDER BY causing DISTINCT ON to drop different duplicates randomly each time, could result in dropping the duplicates that then get filtered out in tbl2 in the first approach, but not in the second.
Example:
pgsql122=# select * from original_data;
uuid | foreign_key | comment
------+-------------+---------------------------------------------------
1 | 1 | correct, non-duplicate record with a correct link
3 | 2 | duplicate record with a broken link
3 | 1 | duplicate record with a correct link
4 | null | duplicate record with a missing link
4 | 1 | duplicate record with a correct link
5 | 3 | duplicate record with a correct link, but a null category behind it
5 | 1 | duplicate record with a correct link
6 | null | correct, non-duplicate record with a missing link
7 | 2 | correct, non-duplicate record with a broken link
8 | 3 | correct, non-duplicate record with a correct link, but a null category behind it
pgsql122=# select * from other_data;
id | category
----+----------
1 | a
3 | null
Both of your approaches keep uuid 1 and eliminate uuid 6, 7 and 8 even though they're unique.
Your first approach randomly keeps between 0 and 3 out of the 3 pairs of duplicates (uuid 3, 4 and 5), depending on which one in each pair gets discarded by DISTINCT ON.
Your second approach always keeps one record for each uuid 3, 4 and 5. Each clone with missing link, a broken link or a link with a null category behind it is already gone by the time you discard duplicates.
As #a_horse_with_no_name suggested, ORDER BY should make DISTINCT ON consistent and predictable but only as long as records vary on the columns used for ordering. It also won't help if you have other issues, like the one I suggest.
I am currently looking into an efficient way to allocate data into a partitioned table. Is it possible to use postgres/psql to COPY data into a specific table partition (instead of using INSERT)?
According to the documentation on COPY here:
COPY FROM can be used with plain, foreign, or partitioned tables or with views that have INSTEAD OF INSERT triggers.
And according to the documentation on partitioning here:
Be aware that COPY ignores rules. If you want to use COPY to insert data, you'll need to copy into the correct partition table rather than into the master. COPY does fire triggers, so you can use it normally if you use the trigger approach.
From my understanding of the aforementioned resources, it seems possible to copy into partition; however, I can't find any examples or support for that online.
In other words, can I write something like:
COPY some_table_partition_one FROM '/some_dir/some_file'
COPY to a partitioned table was introduced in v11:
Allow INSERT, UPDATE, and COPY on partitioned tables to properly route rows to foreign partitions (Etsuro Fujita, Amit Langote)
But COPY directly to a partition is possible in all releases since v10, where declarative partitioning was introduced.
It seems like we forgot to remove the second quotation from the documentation.
It is possible at least with PG 12.2:
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE
CREATE TABLE measurement_y2020m03 PARTITION OF measurement
FOR VALUES FROM ('2020-03-01') TO ('2020-03-31');
CREATE TABLE
CREATE TABLE measurement_y2020m04 PARTITION OF measurement
FOR VALUES FROM ('2020-04-01') TO ('2020-04-30');
CREATE TABLE
insert into measurement values (1, current_date, 10,100);
INSERT 0 1
select * from measurement;
city_id | logdate | peaktemp | unitsales
---------+------------+----------+-----------
1 | 2020-03-27 | 10 | 100
(1 row)
cat /tmp/m.dat
4,2020-04-01,40,400
copy measurement_y2020m04 from '/tmp/m.dat' delimiter ',';
COPY 1
select * from measurement;
city_id | logdate | peaktemp | unitsales
---------+------------+----------+-----------
1 | 2020-03-27 | 10 | 100
4 | 2020-04-01 | 40 | 400
(2 rows)
I want to store full versioning of the row every time a update is made for amount sensitive table.
So far, I have decided to use the following approach.
Do not allow updates.
Every time a update is made create a new
entry in the table.
However, I am undecided on what is the best database structure design for this change.
Current Structure
Primary Key: id
id(int) | amount(decimal) | other_columns
First Approach
Composite Primary Key: id, version
id(int) | version(int) | amount(decimal) | change_reason
1 | 1 | 100 |
1 | 2 | 20 | correction
Second Approach
Primary Key: id
Uniqueness Index on [origin_id, version]
id(int) | origin_id(int) | version(int) | amount(decimal) | change_reason
1 | NULL | 1 | 100 | NULL
2 | 1 | 2 | 20 | correction
I would suggest a new table which store unique id for item. This serves as lookup table for all available items.
item Table:
id(int)
1000
For the table which stores all changes for item, let's call it item_changes table. item_id is a FOREIGN KEY to item table's id. The relationship between item table to item_changes table, is one-to-many relationship.
item_changes Table:
id(int) | item_id(int) | version(int) | amount(decimal) | change_reason
1 | 1000 | 1 | 100 | NULL
2 | 1000 | 2 | 20 | correction
With this, item_id will never be NULL as it is a valid FOREIGN KEY to item table.
The best method is to use Version Normal Form (vnf). Here is an answer I gave for a neat way to track all changes to specific fields of specific tables.
The static table contains the static data, such as PK and other attributes which do not change over the life of the entity or such changes need not be tracked.
The version table contains all dynamic attributes that need to be tracked. The best design uses a view which joins the static table with the current version from the version table, as the current version is probably what your apps need most often. Triggers on the view maintain the static/versioned design without the app needing to know anything about it.
The link above also contains a link to a document which goes into much more detail including queries to get the current version or to "look back" at any version you need.
Why you are not going for SCD-2 (Slowly Changing Dimension), which is a rule/methodology to describe the best solution for your problem. Here is the SCD-2 advantage and example for using, and it makes standard design pattern for the database.
Type 2 - Creating a new additional record. In this methodology, all history of dimension changes is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. Both the prior and new rows contain as attributes the natural key(or other durable identifiers). Also 'effective date' and 'current indicator' columns are used in this method. There could be only one record with the current indicator set to 'Y'. For 'effective date' columns, i.e. start_date, and end_date, the end_date for current record usually is set to value 9999-12-31. Introducing changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute could be added in the future.
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y
Detail Explanation::::
Here, all you need to add the 3 extra column, START_DATE, END_DATE, CURRENT_FLAG to track your record properly. When the first time record inserted # source, this table will be store the value as:
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 NULL Y
And, when the same record will be updated then you have to update the "END_DATE" of the previous record as current_system_date and "CURRENT_FLAG" as "N", and insert the second record as below. So you can track everything about your records. as below...
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y
I'm trying to write a rule on a view to delete tuples from the component tables, but so far can only remove data from one of them. I've used postgres with basic views for a while, but I don't have any experience with rules on views.
I wrote a stupid little test case to figure out/show my problem. There's only one parent tuple per child tuple in this example (my actual schema isn't actually like this of course).
Component tables:
CREATE TABLE parent(
id serial PRIMARY KEY,
p_data integer NOT NULL UNIQUE
);
CREATE TABLE child(
id serial PRIMARY KEY,
parent_id integer NOT NULL UNIQUE REFERENCES parent(id),
c_data integer NOT NULL
);
View:
CREATE TABLE child_view(
id integer,
p_data integer,
c_data integer
);
CREATE RULE "_RETURN" AS ON SELECT TO child_view DO INSTEAD
SELECT child.id, p_data, c_data
FROM parent JOIN child ON (parent_id=parent.id);
Problem delete rule
CREATE RULE child_delete AS ON DELETE TO child_view DO INSTEAD(
DELETE FROM child WHERE id=OLD.id;
DELETE FROM parent WHERE p_data=OLD.p_data;
);
The intent of the above rule is to remove tuples referenced in the view from the component tables. The WHERE p_data=OLD.p_data seems odd to me, but I don't see how else to reference the desired tuple in the parent table.
Here's what happens when I try to use the above rule:
>SELECT * FROM child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 10
2 | 2 | 11
3 | 3 | 12
(3 rows)
>DELETE FROM child_view WHERE id=3;
DELETE 0
>SELECT * FROM child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 10
2 | 2 | 11
(2 rows)
But looking at the parent table, the second part of the delete isn't working (id=3 "should" have been deleted):
>SELECT * FROM parent;
id | p_data
----+--------
1 | 1
2 | 2
3 | 3
(3 rows)
How should I write the deletion rule to remove both child and parent tuples?
This is using postgres v9.
Any help is appreciated. Also pointers to any materials covering rules on views beyond the postgres docs (unless I've obviously missed something) would also be appreciated. Thanks.
EDIT: as jmz points out, it would be easier to use a cascading delete than a rule here, but that approach doesn't work for my actual schema.
What you're seeing with the rule problem is that the rule system doesn't handle the data atomically. The first delete is executed regardless of the order of the two statements in the DO INSTEAD rule. The second statement is never executed, because the row to which OLD.id refers to has been removed from the view. You could use a LEFT JOIN, but that won't help you because of the example table design (it may work on your actual database schema).
The fundamental problem, as I see it, is that you're treating the rule system as it was a trigger.
Your best option is to use foreign keys and ON DELETE CASCADE instead of rules. With them your example schema would work too: You'd only need on delete for the parent table to get rid of all the children.
What you want to do will work fine. But you made a left turn on this:
CREATE TABLE child_view(
id integer,
p_data integer,
c_data integer
);
CREATE RULE "_RETURN" AS ON SELECT TO child_view DO INSTEAD
SELECT child.id, p_data, c_data
FROM parent JOIN child ON (parent_id=parent.id);
You want a real life view here not a table. That is why delete will not work.
CREATE VIEW child_view AS SELECT
child.id,
p_data,
c_data
FROM parent
JOIN child ON (parent_id=parent.id)
;
Replace the top with the bottom and it will work perfectly (It did when I tested it). The reason delete does not work is it trying to delete id from the TABLE child view which is of course empty! It does not execute the 'select do instead' rule so it is working on the real table child view. People may poo-poo using rules but if they cannot see such an obvious mistake I wonder how much they know?
I have used rules successfully in defining interfaces to enforce business rules. They can lead elegant solutions in ways triggers could not.
Note: I only recommend this to make writable views for an interface. You could do clever things like checking constraints across tables - and you may be asking for it. That kind stuff really should be used with triggers.
Edit: script per request
-- set this as you may have had an error if you running
-- from a script and not noticed it with all the NOTICES
\set ON_ERROR_STOP
drop table if exists parent cascade;
drop table if exists child cascade;
CREATE TABLE parent(
id serial PRIMARY KEY,
p_data integer NOT NULL UNIQUE
);
CREATE TABLE child(
id serial PRIMARY KEY,
parent_id integer NOT NULL UNIQUE REFERENCES parent(id),
c_data integer NOT NULL
);
CREATE VIEW child_view AS SELECT
child.id,
p_data,
c_data
FROM parent
JOIN child ON (parent_id=parent.id)
;
CREATE RULE child_delete AS ON DELETE TO child_view DO INSTEAD(
DELETE FROM child WHERE id=OLD.id;
DELETE FROM parent WHERE p_data=OLD.p_data;
);
insert into parent (p_data) values (1), (2), (3);
insert into child (parent_id, c_data) values (1, 1), (2, 2), (3, 3);
select * from child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
(3 rows)
delete from child_view where id=3;
DELETE 0
select * from child_view;
id | p_data | c_data
----+--------+--------
1 | 1 | 1
2 | 2 | 2
(2 rows)
There is a table:
CREATE TABLE temp
(
IDR decimal(9) NOT NULL,
IDS decimal(9) NOT NULL,
DT date NOT NULL,
VAL decimal(10) NOT NULL,
AFFID decimal(9),
CONSTRAINT PKtemp PRIMARY KEY (IDR,IDS,DT)
)
;
Let's see the plan for select star query:
SQL>explain plan for select * from temp;
Explained.
SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)|
| 1 | TABLE ACCESS FULL| TEMP | 1 | 61 | 2 (0)|
---------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
11 rows selected.
SQL server 2008 shows in the same situation Clustered index scan. What is the reason?
select * with no where clause -- means read every row in the table, fetch every column.
What do you gain by using an index? You have to go to the index, get a rowid, translate the rowid into a table offset, read the file.
What happens when you do a full table scan? You go the th first rowid in the table, then read on through the table to the end.
Which one of these is faster given the table you have above? Full table scan. Why? because it skips having to to go the index, retreive values, then going back to the other to where the table lives and fetching.
To answer this more simply without mumbo-jumbo, the reason is:
Clustered Index = Table
That's by definition in SQL Server. If this is not clear, look up the definition.
To be absolutely clear once again, since most people seem to miss this, the Clustered Index IS the table itself. It therefore follows that "Clustered Index Scan" is another way of saying "Table Scan". Or what Oracle calls "TABLE ACCESS FULL"