I have a postgres table (lets call this table Events) with a composite foreign key to another table (lets call this table Logs). The Events table looks like this:
CREATE TABLE Events (
ColPrimary UUID,
ColA VARCHAR(50),
ColB VARCHAR(50),
ColC VARCHAR(50),
PRIMARY KEY (ColPrimary),
FOREIGN KEY (ColA, ColB, ColC) REFERENCES Logs(ColA, ColB, ColC)
);
In this case, I know that I can efficiently search for Events by the primary key, and join to Logs.
What I am interested in is if this foreign key creates an index on the Events table which can be useful even without joining. For example, would the following query benefit from the FK?
SELECT * FROM Events
WHERE ColA='foo' AND ColB='bar'
Note: I have run the POSTGRES EXPLAIN for a very similar case to this, and see that the query will result in a full table scan. I am not sure if this is because the FK is not helpful for this query, or if my data size is small and a scan is more efficient at my current scale.
PostgreSQL does not automatically create an index on the columns on which a foreign key is defined. If you need such an index, you will have to create it yourself.
It is usually a good idea to have such an index, so that modifications on the parent table that affect the referenced columns are efficient.
Related
I am recreating an existing table as a partitioned table in PostgreSQL 11.
After some research, I am approaching it using the following procedure so this can be done online while writes are still happening on the table:
add a check constraint on the existing table, first as not valid and then validating
drop the existing primary key
rename the existing table
create the partitioned table under the prior table name
attach the existing table as a partition to the new partitioned table
My expectation was that the last step would be relatively fast, but I don't really have a number for this. In my testing, it's taking about 30s. I wonder if my expectations are incorrect or if I'm doing something wrong with the constraint or anything else.
Here's a simplified version of the DDL.
First, the inserted_at column is declared like this:
inserted_at timestamp without time zone not null
I want to have an index on the ID even after I drop the PK for existing queries and writes, so I create an index:
create unique index concurrently my_events_temp_id_index on my_events (id);
The check constraint is created in one transaction:
alter table my_events add constraint my_events_2022_07_events_check
check (inserted_at >= '2018-01-01' and inserted_at < '2022-08-01')
not valid;
In the next transaction, it's validated (and the validation is successful):
alter table my_events validate constraint my_events_2022_07_events_check;
Then before creating the partitioned table, I drop the primary key of the existing table:
alter table my_events drop constraint my_events_pkey cascade;
Finally, in its own transaction, the partitioned table is created:
alter table my_events rename to my_events_2022_07;
create table my_events (
id uuid not null,
... other columns,
inserted_at timestamp without time zone not null,
primary key (id, inserted_at)
) partition by range (inserted_at);
alter table my_events attach partition my_events_2022_07
for values from ('2018-01-01') to ('2022-08-01');
That last transaction blocks inserts and takes about 30s for the 12M rows in my test database.
Edit
I wanted to add that in response to the attach I see this:
INFO: partition constraint for table "my_events_2022_07" is implied by existing constraints
That makes me think I'm doing this right.
The problem is not the check constraint, it is the primary key.
If you make the original unique index include both columns:
create unique index concurrently my_events_temp_id_index on my_events (id,inserted_at);
And if you make the new table have a unique index rather than a primary key on those two columns, then the attach is nearly instantaneous.
These seem to me like unneeded restrictions in PostgreSQL, both that the unique index on one column can't be used to imply uniqueness on the both columns, and that the unique index on both columns cannot be used to imply the primary key (nor even a unique constraint--but only a unique index).
I am attempting to speed up a bulk load. The bulk load is performed into a table with all primary keys, indexes, and foreign keys dropped. After data finishes loading we go and apply the necessary constraints back to the database. As a simple test I have the following setup:
CREATE TABLE users
(
id int primary key
);
CREATE TABLE events
(
id serial,
user_id1 int,
user_id2 int,
unique_int1 int,
unique_int2 int,
unique_int3 int
);
INSERT INTO
users (id)
SELECT generate_Series(1,100000000);
INSERT INTO
events (user_id1,user_id2,unique_int1,unique_int2,unique_int3)
SELECT
generate_Series(1,100000000),
generate_Series(1,100000000),
generate_Series(1,100000000),
generate_Series(1,100000000),
generate_Series(1,100000000);
I then wish to apply the following constraints via 5 individual alter table statements:
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_user_id1_fkey FOREIGN KEY (user_id1) REFERENCES public.users(id);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_user_id2_fkey FOREIGN KEY (user_id2) REFERENCES public.users(id);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_unique_int1_me_key UNIQUE (unique_int1);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_unique_int2_me_key UNIQUE (unique_int2);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_unique_int3_me_key UNIQUE (unique_int3);
Each of the above statements takes approximately 90 seconds to run for a total of 450 seconds. I expected that when I combined the 5 above statements into a single ALTER TABLE statement that the time would be reduced but in fact it has increased.
alter table only public.events
ADD CONSTRAINT events_pkey PRIMARY KEY (id),
ADD CONSTRAINT events_user_id1_fkey FOREIGN KEY (user_id1) REFERENCES public.users(id),
ADD CONSTRAINT events_user_id2_fkey FOREIGN KEY (user_id2) REFERENCES public.users(id),
ADD CONSTRAINT events_unique_int1_me_key UNIQUE (unique_int1),
ADD CONSTRAINT events_unique_int2_me_key UNIQUE (unique_int2),
ADD CONSTRAINT events_unique_int3_me_key UNIQUE (unique_int3);
This takes 520 seconds whereas I expected it to at least take less than 450 seconds. My reason for thinking this is from the Postgres documentation for the ALTER TABLE statement where in the Notes section it reads
The main reason for providing the option to specify multiple changes in a single ALTER TABLE is that multiple table scans or rewrites can thereby be combined into a single pass over the table.
Can anyone explain the above measurements or have any suggestions?
This case is not going to be helped much as each of the commands needs to do a pass to verify the conditions of the constraints FK or UNIQUE. Also the docs:
When multiple subcommands are given, the lock acquired will be the strictest one required by any subcommand.
So by combining the commands you are working on the strictest locking for the entire command.
The section you quoted is amplified further on:
For example, it is possible to add several columns and/or alter the type of several columns in a single command. This is particularly useful with large tables, since only one pass over the table need be made.
The conclusion is combining commands is not necessarily a time saver.
Unifying the table read is going to be particularly beneficial if there is no other IO to be done, like column type change or check constraint. But here you need to actually build the indexes or look up the foreign keys, and that is what will dominate your run time. So it is no surprise that you don't see a big gain.
For a small loss in performance, it could be that that is due to worse cache usage, less efficient parallel query execution, or interrupted stride length for readahead or for bulk writing to your IO devices.
I'm trying to partitioning a table with existing rows in Postgresql 10.8.
The structure is like this :
I'm trying to create partitions of table Item, it has around 5mill of rows.
I create the partitions with those commands:
CREATE TABLE item_1 (CHECK (id >0 AND id <1000001)) INHERITS (item);
CREATE TABLE item_2 (CHECK (id >1000000 AND id <2000001)) INHERITS (item);
...
Then the rules:
CREATE RULE item_1_rule AS ON INSERT TO item WHERE (id >0 AND id <1000001) DO INSTEAD INSERT INTO item_1 VALUES (NEW.*);
CREATE RULE item_2_rule AS ON INSERT TO item WHERE (id >1000000 AND id <2000001) DO INSTEAD INSERT INTO item_2 VALUES (NEW.*);
...
Then the migration to the partitioned tables:
INSERT INTO item_1 SELECT * FROM item WHERE (id >0 AND id <1000001);
INSERT INTO item_2 SELECT * FROM item WHERE (id >1000000 AND id <2000001);
...
And finally I try to Delete all the rows from the Item table:
DELETE FROM ONLY item;
But I get this error:
ERROR: update or delete on table "item" violates foreign key constraint >"item_audit_item_id_fkey" on table "item_audit"
SQL state: 23503
Detail: Key (id)=(1) is still referenced from table "item_audit".
So is it possible to drop the data from the main table Item to have only the rows on the partitions tables?
There are other alternatives to make a partition in a different way?
If you use inheritance partitioning, you won't be able to have foreign keys pointing to the partitioned table. You have to drop the constraint before you can delete the rows.
I recommend to use declarative partitioning. If you upgrade to v11 or later, you can have a foreign key pointing to a partitioned table, but since the primary key on a partitioned table has to contain the partitioning key, you'd have to add that column to the foreign key as well.
Since partitioning is mostly useful for mass data deletion, it might make sense to partition item_audit the same way.
Hello I want to delete all data in my postgresql tables, but not the table itself.
How could I do this?
Use the TRUNCATE TABLE command.
The content of the table/tables in PostgreSQL database can be deleted in several ways.
Deleting table content using sql:
Deleting content of one table:
TRUNCATE table_name;
DELETE FROM table_name;
Deleting content of all named tables:
TRUNCATE table_a, table_b, …, table_z;
Deleting content of named tables and tables that reference to them (I will explain it in more details later in this answer):
TRUNCATE table_a, table_b CASCADE;
Deleting table content using pgAdmin:
Deleting content of one table:
Right click on the table -> Truncate
Deleting content of table and tables that reference to it:
Right click on the table -> Truncate Cascaded
Difference between delete and truncate:
From the documentation:
DELETE deletes rows that satisfy the WHERE clause from the specified
table. If the WHERE clause is absent, the effect is to delete all rows
in the table.
http://www.postgresql.org/docs/9.3/static/sql-delete.html
TRUNCATE is a PostgreSQL extension that provides a faster mechanism to
remove all rows from a table. TRUNCATE quickly removes all rows from a
set of tables. It has the same effect as an unqualified DELETE on each
table, but since it does not actually scan the tables it is faster.
Furthermore, it reclaims disk space immediately, rather than requiring
a subsequent VACUUM operation. This is most useful on large tables.
http://www.postgresql.org/docs/9.1/static/sql-truncate.html
Working with table that is referenced from other table:
When you have database that has more than one table the tables have probably relationship.
As an example there are three tables:
create table customers (
customer_id int not null,
name varchar(20),
surname varchar(30),
constraint pk_customer primary key (customer_id)
);
create table orders (
order_id int not null,
number int not null,
customer_id int not null,
constraint pk_order primary key (order_id),
constraint fk_customer foreign key (customer_id) references customers(customer_id)
);
create table loyalty_cards (
card_id int not null,
card_number varchar(10) not null,
customer_id int not null,
constraint pk_card primary key (card_id),
constraint fk_customer foreign key (customer_id) references customers(customer_id)
);
And some prepared data for these tables:
insert into customers values (1, 'John', 'Smith');
insert into orders values
(10, 1000, 1),
(11, 1009, 1),
(12, 1010, 1);
insert into loyalty_cards values (100, 'A123456789', 1);
Table orders references table customers and table loyalty_cards references table customers. When you try to TRUNCATE / DELETE FROM the table that is referenced by other table/s (the other table/s has foreign key constraint to the named table) you get an error. To delete content from all three tables you have to name all these tables (the order is not important)
TRUNCATE customers, loyalty_cards, orders;
or just the table that is referenced with CASCADE key word (you can name more tables than just one)
TRUNCATE customers CASCADE;
The same applies for pgAdmin. Right click on customers table and choose Truncate Cascaded.
For small tables DELETE is often faster and needs less aggressive locking (important for concurrent load):
DELETE FROM tbl;
With no WHERE condition.
For medium or bigger tables, go with TRUNCATE, like #Greg posted:
TRUNCATE tbl;
Hard to pin down the line between "small" and "big", as that depends on many variables. You'll have to test in your installation.
I found a very easy and fast way for everyone who might use a tool like DBeaver:
You just need to select all the tables that you want to truncate (SHIFT + click or CTRL + click) then right click
And if you have foreign keys, select also CASCADE option on Settings panel. Start and that's all it takes!
I currently have a parent table:
CREATE TABLE members (
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
first_name varchar(20)
last_name varchar(20)
address address (composite type)
contact_numbers varchar(11)[3]
date_joined date
type varchar(5)
);
and two related tables:
CREATE TABLE basic_member (
activities varchar[3]) // can only have 3 max activites
INHERITS (members)
);
CREATE TABLE full_member (
activities varchar[]) // can 0 to many activities
INHERITS (members)
);
I also have another table:
CREATE TABLE planner (
day varchar(9) FOREIGN KEY REFERENCES days(day)
time varchar(5) FOREIGN KEY REFERENCES times(time)
activity varchar(20) FOREIGN KEY REFERENCES activities(activity)
member bigint FOREIGN KEY REFERENCES members(member_id)
);
ALTER TABLE planner ADD CONSTRAINT pk_planner PRIMARKY KEY (day,time,activity,member);
I am currently trying to add with
INSERT INTO planner VALUES ('monday','09:00','Weights',2);
I have added a set into full_members with
INSERT INTO full_members
VALUES (Default, 'Hayley', 'Sargent', (12,'Forest Road','Mansfield','Nottinghamshire','NG219DX'),'{01623485764,07789485763,01645586754}',20120418,'Full');
My insert into Planner is currently not working — can you explain why?
i managed ot answer my own question it was becuase at the moment posgreSQL doesn't work very well with inheritence and foreign keys, so i have ot create a rule
CREATE RULE member_ref
AS ON INSERT TO planner
WHERE new.member NOT IN (SELECT member_id FROM members)
DO INSTEAD NOTHING;
this is basically the same as a foreign key
Not sure if this will be better solution but here it goes...
The principle is quite simple:
create new table lets call it table_with_pkeys which will replicate primary key column(s) of inherited tables child1, child2, child3...
create triggers on inherited tables, after insert, insert new PK into table_with_pkeys newly created PK, after update if it changes update it and after delete delete the same PK from table_with_pkeys.
Then in every table which should reference child1, child2 or whichever through parent table's PK using FK, point that FK not to parent's PK, but to table_with_pkeys which has copies of all childs PK's, and so you will have easy manageable way to have foreign keys that can cascade updates, restrict updates and so on.
Hope it helps.
You are missing an open quote before the 12 in the address:
INSERT INTO full_members
VALUES (Default, 'Hayley', 'Sargent', (12 Forest Road', 'Mansfield', 'Nottinghamshire', 'NG219DX'),
'{01623485764,07789485763,01645586754}',20120418,'Full');
should be:
INSERT INTO full_members
VALUES (Default, 'Hayley', 'Sargent', ('12 Forest Road', 'Mansfield', 'Nottinghamshire', 'NG219DX'),
'{01623485764,07789485763,01645586754}',20120418,'Full');
If the materialized view approach doesn't work for you above, create constraint triggers to check the referential integrity. Unfortunately declarative referential integrity doesn't work well with inheritance at present.