Create many partitions as a result of a select statement in Postgres - postgresql

Here is a table of phone numbers named phone_number:
phone_number
country_code
owner
07911 123456
44
Ada
08912 654321
44
Thomas
06 12 34 56 78
33
Jonathan
06 87 65 43 21
33
Arthur
Let's say we want to partition this table by country code, therefore creating this table phone_number_bis
CREATE TABLE phone_number_bis (
phone_number VARCHAR,
country_code INTEGER,
owner VARCHAR NOT NULL,
PRIMARY KEY (phone_number, country_code)
) PARTITION BY LIST(country_code)
Loading the content of phone_number into phone_number_bis will produce the following error:
INSERT INTO phone_number_bis( phone_number, country_code, owner)
SELECT phone_number, country_code, owner
FROM phone_number;
ERROR: no partition of relation "phone_number_bis" found for row
Partition key of the failing row contains (country_code) = (44)
Is there a SQL command that could create all necessary partitions before loading data into phone_number_bis, not knowing the content of the country_code column in advance ?
NB: as Franck Heikens pointed out, partitioning the table may not be relevant for storing phone numbers. This is an example made in order to make a complex problem more understable.

If your client is psql, you can use \gexec to make it run a query and then run each result as a new command. So then you would need to write one query which output a text string containing a suitable CREATE TABLE statement for each distinct country_code. To do it entirely on the server side, you could use pl/pgsql to do much the same thing, constructing a string and then using dynamic sql to EXECUTE the string.

Is there a SQL command that could create all necessary partitions
before loading data into phone_number_bis, not knowing the content of
the country_code column in advance ?
You can use DEFAULT partition, then split partition from DEFAULT.
begin;
create table phone_number(phone_number text,country_code integer, owner text);
insert into phone_number select 'dummy_' || (random()::numeric(10,4)),
120 + i,'owner'||i from generate_series(1, 5) g(i);
insert into phone_number select 'dummy_' || (random()::numeric(10,4)),
220+i,'owner'||i from generate_series(1, 5) g(i);
insert into phone_number select 'dummy_' || (random()::numeric(10,4)), 1 ,'owner'||i from generate_series(1, 20) g(i);
select string_agg(distinct (country_code::text),', ' order by (country_code::text))
from phone_number
where country_code > 99 and country_code < 201;
commit;
BEGIN;
CREATE TABLE phone_number_bis (
phone_number text,country_code integer,OWNER text,
PRIMARY KEY (phone_number, country_code)
)
PARTITION BY LIST (country_code);
CREATE TABLE phone_number_bis_01 PARTITION OF phone_number_bis
FOR VALUES IN (1);
CREATE TABLE phone_number_bis_2_300 PARTITION OF phone_number_bis
FOR VALUES IN (121, 122, 123, 124, 125);
CREATE TABLE phone_number_bis_default PARTITION OF phone_number_bis DEFAULT;
INSERT INTO phone_number_bis (phone_number, country_code, OWNER)
SELECT phone_number, country_code,OWNER FROM phone_number;
COMMIT;
now split the partition from default partition create an new partition for values in 200 to 300.
BEGIN;
ALTER TABLE phone_number_bis DETACH PARTITION phone_number_bis_default;
ALTER TABLE phone_number_bis_default RENAME TO phone_number_bis_default_old;
CREATE TABLE phone_number_bis_200_300 PARTITION OF phone_number_bis
FOR VALUES IN (221, 222, 223, 224, 225);
CREATE TABLE phone_number_bis_default PARTITION OF phone_number_bis DEFAULT;
INSERT INTO phone_number_bis (phone_number, country_code, OWNER)
SELECT phone_number, country_code, OWNER FROM phone_number_bis_default_old;
COMMIT;
https://www.postgresql.org/docs/current/ddl-partitioning.html
Quote:
Choosing the target number of partitions that the table should be
divided into is also a critical decision to make. Not having enough
partitions may mean that indexes remain too large and that data
locality remains poor which could result in low cache hit ratios.
However, dividing the table into too many partitions can also cause
issues. Too many partitions can mean longer query planning times and
higher memory consumption during both query planning and execution, as
further described below.
Quote:
Another reason to be concerned about having a large number of
partitions is that the server's memory consumption may grow
significantly over time, especially if many sessions touch large
numbers of partitions. That's because each partition requires its
metadata to be loaded into the local memory of each session that
touches it.
More partition, will consume more memory, there is an case: https://www.postgresql.org/message-id/flat/PH0PR11MB5191F459DCB44A91682FE8C8D6409%40PH0PR11MB5191.namprd11.prod.outlook.com#86aaad1ddd6350efc062c2dd79a31821

As Laurenz Albe and jjanes said, it cannot be done only with a SQL command.
A PL/pgSQL procedure seems to be required here :
DO $$
DECLARE partition_number INTEGER;
BEGIN
FOR partition_number IN SELECT DISTINCT(country_code) FROM phone_number
LOOP
EXECUTE FORMAT('CREATE TABLE phone_number_bis_%s PARTITION OF phone_number_bis FOR VALUES IN (%s)', partition_number, partition_number);
END LOOP ;
END;
$$ LANGUAGE plpgsql;
INSERT INTO phone_number_bis( phone_number, country_code, owner)
SELECT phone_number, country_code, owner
FROM phone_number; -- No error as partitions have been created before insertion

Related

How to get value list of list partitioning table of postgresql?

I am trying to use list partitioning in PostgreSQL.
https://www.postgresql.org/docs/current/ddl-partitioning.html
So, I have some questions about that.
Is there a limit on the number of values or partition tables in list partitioning?
When a partitioning table is created as shown below, can i check the value list with SQL? (like keys = [test, test_2])
CREATE TABLE part_table (id int, branch text, key_name text) PARTITION BY LIST (key_name);
CREATE TABLE part_default PARTITION OF part_table DEFAULT;
CREATE TABLE part_test PARTITION OF part_table FOR VALUES IN ('test');
CREATE TABLE part_test_2 PARTITION OF part_table FOR VALUES IN ('test_2');
When using the partitioning table created above, if data is added with key_name = "test_3", it is added to the default table. If 'test_3' exists in the default table and partitioning is attempted with the corresponding value, the following error occurs.
In this case, is there a good way to partition with the value 'test_3' without deleting the value in the default table?
CREATE TABLE part_test_3 PARTITION OF part_table FOR VALUES IN ('test_3');
Error: updated partition constraint for default partition "part_default" would be violated by some row
Is it possible to change the table name or value of a partition table?
Thank you..!
Is there a limit on the number of values or partition tables in list
partitioning?
Some test: https://www.depesz.com/2021/01/17/are-there-limits-to-partition-counts/
The value in current table and value reside in which partition.
SELECT
tableoid::pg_catalog.regclass,
array_agg(DISTINCT key_name)
FROM
part_table
GROUP BY
1;
To get all the current partition, and the configed value range. Use the following.
SELECT
c.oid::pg_catalog.regclass,
c.relkind,
inhdetachpending as is_detached,
pg_catalog.pg_get_expr(c.relpartbound, c.oid)
FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i
WHERE c.oid = i.inhrelid
AND i.inhparent = '58281'
--the following query will return 58281.
select c.oid
from pg_catalog.pg_class c
where relname ='part_table';

How to use the same common table expression in two consecutive psql statements?

I'm trying to perform a pretty basic operation with a few steps:
SELECT data from table1
Use id column from my selected table to remove data from table2
Insert the selected table from step 1 into table2
I would imagine that this would work
begin;
with temp as (
select id
from table1
)
delete from table2
where id in (select id from temp);
insert into table2 (id)
select id from temp;
commit;
But I'm getting an error saying that temp is not defined during my insert step?
Only other post I found about this is this one but it didn't really answer my question.
Thoughts?
From Postgres documentation:
WITH provides a way to write auxiliary statements for use in a larger
query. These statements, which are often referred to as Common Table
Expressions or CTEs, can be thought of as defining temporary tables
that exist just for one query.
If you need a temp table for more than one query you can do instead:
begin;
create temp table temp_table as (
select id
from table1
);
delete from table2
where id in (select id from temp_table);
insert into table2 (id)
select id from temp_table;
commit;

The query result changes in remote and local database

The below table is created in local database and remote databases.
CREATE TABLE EMPLOYEE1 ( EMP_ID INTEGER, EMP_NAME VARCHAR(10), EMP_DEPT VARCHAR(10) );
Insert the below rows in tables created in both the databases.
INSERT INTO EMPLOYEE1 (EMP_ID, EMP_NAME,EMP_DEPT)
VALUES (1,'A','IT'), (2,'B','IT'), (3,'C','SALES'), (4,'D','SALES'), (5,'E','ACCOUNTS'), (6,'F','ACCOUNTS'), (7,'G','HR'), (8,'H','HR');
COMMIT;
If i run the below query in local database of my system then the query result is correct.i.e it is returning all the rows in the table as the query exactly has to do. But the same query if i run in remote database then only 4 rows are returned,which is a wrong result.
SELECT * FROM EMPLOYEE1 WHERE (EMP_DEPT NOT IN ('IT','SALES') OR EMP_DEPT IN ('IT','SALES'));
Can anyone suggest why the query behavior changes?
As per your query, you want to select all the records. Then simply you can use the following
SELECT * FROM EMPLOYEE1
What is the purpose of this condition?
WHERE (EMP_DEPT NOT IN ('IT','SALES') OR EMP_DEPT IN ('IT','SALES'))

Change the starting value of a serial - Postgresql

I've a little problem with serial : From a file, I filled my database in which I have a client ID (it is a serial and it is my primary key). I have 300 clients so 300 client ID (1 to 300). Now my problem is, I've a form for new clients.I cannot add them because when I add a client, my program adds the client with ID 1 or the ID 1 is already assigned to another client.
So my question is : is it possible to change the starting value of a serial for to resolve this problem ?
You can alter a sequence using RESTART WITH to change the current sequence number;
ALTER SEQUENCE test_seq RESTART WITH 300;
To get the sequence name if you created it using the serial keyword, use
SELECT adsrc FROM pg_attrdef WHERE adrelid = (SELECT oid FROM pg_class WHERE relname = 'table name goes here');
An SQLfiddle to test with.
PostgreSQL
ALTER SEQUENCE tablename_columnname_seq RESTART WITH anynumber;
Example:
ALTER SEQUENCE test_table_rec_id_seq RESTART WITH 4615793;
if your Postgresql version is higher than the upper answer, you could try getting serial key with select pg_get_serial_sequence('ingredients', 'id');
and
SELECT adsrc FROM pg_attrdef WHERE adrelid = (SELECT oid FROM pg_class WHERE relname = 'ingredients');
For those who try to point to specific schema's table, here's the query you will need to execute.
// To get the sequence name
SELECT pg_get_serial_sequence('"yourSchema"."yourTable"', 'yourColumn');
//Output: yourSchema."yourTable_yourColumn_seq"
ALTER SEQUENCE yourSchema."yourTable_yourColumn_seq" RESTART WITH 100;
The solutions above did not work for what I needed.
I needed a serial id to use as a primary key that started from 1000, rather than 1.
In order to do this, I created a standard serial column:
ALTER table my_table ADD COLUMN new_id SERIAL PRIMARY KEY;
and then updated that column:
UPDATE my_table set new_id = new_id + 1000;
I then joined that table to the table with existing non-consecutive id numbers under 1000.

Compact or renumber IDs for all tables, and reset sequences to max(id)?

After running for a long time, I get more and more holes in the id field. Some tables' id are int32, and the id sequence is reaching its maximum value. Some of the Java sources are read-only, so I cannot simply change the id column type from int32 to long, which would break the API.
I'd like to renumber them all. This may be not good practice, but good or bad is not concerned in this question. I want to renumber, especially, those very long IDs like "61789238", "548273826529524324". I don't know why they are so long, but shorter IDs are also easier to handle manually.
But it's not easy to compact IDs by hand because of references and constraints.
Does PostgreSQL itself support of ID renumbering? Or is there any plugin or maintaining utility for this job?
Maybe I can write some stored procedures? That would be very nice so I can schedule it once a year.
The question is old, but we got a new question from a desperate user on dba.SE after trying to apply what is suggested here. Find an answer with more details and explanation over there:
Compacting a sequence in PostgreSQL
The currently accepted answer will fail for most cases.
Typically, you have a PRIMARY KEY or UNIQUE constraint on an id column, which is NOT DEFERRABLE by default. (OP mentions references and constraints.) Such constraints are checked after each row, so you most likely get unique violation errors trying. Details:
Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Typically, one wants to retain the original order of rows while closing gaps. But the order in which rows are updated is arbitrary, leading to arbitrary numbers. The demonstrated example seems to retain the original sequence because physical storage still coincides with the desired order (inserted rows in desired order just a moment earlier), which is almost never the case in real world applications and completely unreliable.
The matter is more complicated than it might seem at first. One solution (among others) if you can afford to remove the PK / UNIQUE constraint (and related FK constraints) temporarily:
BEGIN;
LOCK tbl;
-- remove all FK constraints to the column
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
-- for the simple case without FK references - or see below:
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
-- Update referencing value in FK columns at the same time (if any)
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
-- add all FK constraints to the column back
COMMIT;
This is also much faster for big tables, because checking PK (and FK) constraint(s) for every row costs a lot more than removing the constraint(s) and adding it (them) back.
If there are FK columns in other tables referencing tbl.id, use data-modifying CTEs to update all of them.
Example for a table fk_tbl and a FK column fk_id:
WITH u1 AS (
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id
RETURNING t.id, t1.new_id -- return old and new ID
)
UPDATE fk_tbl f
SET fk_id = u1.new_id -- set to new ID
FROM u1
WHERE f.fk_id = u1.id; -- match on old ID
More in the referenced answer on dba.SE.
Assuming your ids are generated from a bignum sequence, just RESTART the sequence and update the table with idcolumn = DEFAULT.
CAVEAT: If this id column is used as a foreign key by other tables, make sure you have the on update cascade modifier turned on.
For example:
Create the table, put some data in, and remove a middle value:
db=# create sequence xseq;
CREATE SEQUENCE
db=# create table foo ( id bigint default nextval('xseq') not null, data text );
CREATE TABLE
db=# insert into foo (data) values ('hello'), ('world'), ('how'), ('are'), ('you');
INSERT 0 5
db=# delete from foo where data = 'how';
DELETE 1
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
4 | are
5 | you
(4 rows)
Reset your sequence:
db=# ALTER SEQUENCE xseq RESTART;
ALTER SEQUENCE
Update your data:
db=# update foo set id = DEFAULT;
UPDATE 4
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
3 | are
4 | you
(4 rows)
new id column and Foreign Key(s) while the old ones are still in use. With some (quick) renaming, applications do not have to be aware. (But applications should be inactive during the final renaming step)
\i tmp.sql
-- the test tables
CREATE TABLE one (
id serial NOT NULL PRIMARY KEY
, payload text
);
CREATE TABLE two (
id serial NOT NULL PRIMARY KEY
, the_fk INTEGER REFERENCES one(id)
ON UPDATE CASCADE ON DELETE CASCADE
);
-- And the supporting index for the FK ...
CREATE INDEX ON two(the_fk);
-- populate
INSERT INTO one(payload)
SELECT x::text FROM generate_series(1,1000) x;
INSERT INTO two(the_fk)
SELECT id FROM one WHERE random() < 0.3;
-- make some gaps
DELETE FROM one WHERE id % 13 > 0;
-- SELECT * FROM two;
-- Add new keycolumns to one and two
ALTER TABLE one
ADD COLUMN new_id SERIAL NOT NULL UNIQUE
;
-- UPDATE:
-- This could need DEFERRABLE
-- Note since the update is only a permutation of the
-- existing values, we dont need to reset the sequence.
UPDATE one SET new_id = self.new_id
FROM ( SELECT id, row_number() OVER(ORDER BY id) AS new_id FROM one ) self
WHERE one.id = self.id;
ALTER TABLE two
ADD COLUMN new_fk INTEGER REFERENCES one(new_id)
;
-- update the new FK
UPDATE two t
SET new_fk = o.new_id
FROM one o
WHERE t.the_fk = o.id
;
SELECT * FROM two;
-- The crucial part: the final renaming
-- (at this point it would be better not to allow other sessions
-- messing with the {one,two} tables ...
-- --------------------------------------------------------------
ALTER TABLE one DROP COLUMN id CASCADE;
ALTER TABLE one rename COLUMN new_id TO id;
ALTER TABLE one ADD PRIMARY KEY(id);
ALTER TABLE two DROP COLUMN the_fk CASCADE;
ALTER TABLE two rename COLUMN new_fk TO the_fk;
CREATE INDEX ON two(the_fk);
-- Some checks.
-- (the automatically generated names for the indexes
-- and the sequence still contain the "new" names.)
SELECT * FROM two;
\d one
\d two
UPDATE: added the permutation of new_id (after creating it as a serial)
Funny thing is: it doesn't seem to need 'DEFERRABLE'.
*This script will work for postgresql
This is a generic solution that works for all cases
This query find the desciption of the fields of all tables from any database.
WITH description_bd AS (select colum.schemaname,coalesce(table_name,relname) as table_name , column_name, ordinal_position, column_default, data_type, is_nullable, character_maximum_length, is_updatable,description from
( SELECT columns.table_schema as schemaname,columns.table_name, columns.column_name, columns.ordinal_position, columns.column_default, columns.data_type, columns.is_nullable, columns.character_maximum_length, columns.character_octet_length, columns.is_updatable, columns.udt_name
FROM information_schema.columns
) colum
full join (SELECT schemaname, relid, relname,objoid, objsubid, description
FROM pg_statio_all_tables ,pg_description where pg_statio_all_tables.relid= pg_description.objoid ) descre
on descre.relname = colum.table_name and descre.objsubid=colum.ordinal_position and descre.schemaname=colum.schemaname )
This query propose a solution to fix the sequence of all database tables (this generates a query in the req field which fixes the sequence of the different tables).
It finds the number of records of the table and then increment this number by one.
SELECT table_name, column_name, ordinal_position,column_default,
data_type, is_nullable, character_maximum_length, is_updatable,
description,'SELECT setval('''||schemaname||'.'|| replace(replace(column_default,'''::regclass)',''),'nextval(''','')||''', (select max( '||column_name ||')+1 from '|| table_name ||' ), true);' as req
FROM description_bd where column_default like '%nextva%'
Since I didn't like the answers, I wrote a function in PL/pgSQL to do the job.
It is called like this :
=> SELECT resequence('port','id','port_id_seq');
resequence
--------------
5090 -> 3919
Takes 3 parameters
name of table
name of column that is SERIAL
name of sequence that the SERIAL uses
The function returns a short report of what it has done, with the previous value of the sequence and the new value.
The function LOOPs over the table ORDERed by the named column and makes an UPDATE for each row. Then sets the new value for the sequence. That's it.
The order of the values is preserved.
No ADDing and DROPing of temporary columns or tables involved.
No DROPing and ADDing of constraints and foreign keys needed.
Of course You better have ON UPDATE CASCADE for those foreign keys.
The code :
CREATE OR REPLACE FUNCTION resequence(_tbl TEXT, _clm TEXT, _seq TEXT) RETURNS TEXT AS $FUNC$
DECLARE
_old BIGINT;_new BIGINT := 0;
BEGIN
FOR _old IN EXECUTE 'SELECT '||_clm||' FROM '||_tbl||' ORDER BY '||_clm LOOP
_new=_new+1;
EXECUTE 'UPDATE '||_tbl||' SET '||_clm||'='||_new||' WHERE '||_clm||'='||_old;
END LOOP;
RETURN (nextval(_seq::regclass)-1)||' -> '||setval(_seq::regclass,_new);
END $FUNC$ LANGUAGE plpgsql;