Automatic partitioning by day - PostgreSQL - postgresql

I would like to do a daily partitions. I know with oracle is something like this.
CREATE TABLE "PUBLIC"."TEST"
(
"ID" NUMBER(38,0) NOT NULL ENABLE,
"SOME_FIELD" VARCHAR2(20 BYTE) NOT NULL ENABLE,
"ANOTHER_FIELD" VARCHAR2(36 BYTE) NOT NULL ENABLE,
TABLESPACE "PUBLIC"."TEST_DATA"
PARTITION BY RANGE ("TEST_DATE") INTERVAL (NUMTODSINTERVAL(1,'DAY'))
(PARTITION "TEST_P1"
VALUES LESS THAN (TIMESTAMP' 2019-01-01 00:00:00') TABLESPACE "TEST_DATA" );
What about PostgreSQL?
NEW EDIT:
SAMPLE SCRIPT:
The script which will maintain first 15 days data in one table say "p1" and remaining days data in another partition.
1- Creating automatic partion depends on the date range of insert command
2- In script i have also mentioned that how we can add index on the required column's.
3- Data from date range from 1st to 14th will be added in partition "p1" and remaining will be added in partition "p2".
Sample Script :
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
CREATE OR REPLACE FUNCTION new_partition_creator() RETURNS trigger AS
$BODY$
DECLARE
partition_date TEXT;
partition TEXT;
partition_day int;
startdate date;
enddate date;
BEGIN
partition_day := to_char(NEW.logdate,'DD');
partition_date := to_char(NEW.logdate,'YYYY_MM');
IF partition_day < 15 THEN
partition := TG_RELNAME || '_' || partition_date || '_p1';
startdate := to_char(NEW.logdate,'YYYY-MM-01');
enddate := date_trunc('MONTH', NEW.logdate) + INTERVAL '1 MONTH - 1 day';
ELSE
partition := TG_RELNAME || '_' || partition_date || '_p2';
startdate := to_char(NEW.logdate,'YYYY-MM-15');
enddate := date_trunc('MONTH', NEW.logdate) + INTERVAL '1 MONTH - 1 day';
END IF;
IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname=partition) THEN
RAISE NOTICE 'A partition has been created %',partition;
EXECUTE 'CREATE TABLE ' || partition || ' ( CHECK ( logdate >= DATE ''' || startdate || ''' AND logdate <= DATE ''' || enddate || ''' )) INHERITS (' || TG_RELNAME || ');';
EXECUTE 'CREATE INDEX ' || partition || '_logdate ON ' || partition || '(logdate)';
EXECUTE 'ALTER TABLE ' || partition || ' add primary key(city_id);';
END IF;
EXECUTE 'INSERT INTO ' || partition || ' SELECT(' || TG_RELNAME || ' ' || quote_literal(NEW) || ').* RETURNING city_id;';
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
CREATE TRIGGER testing_partition_insert_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE new_partition_creator();
postgres=# insert into measurement values(1,'2017-10-11',10,10);
NOTICE: A partition has been created measurement_2017_10_p1
INSERT 0 0

You can use extension pg_partman for automatic partition creation.
https://github.com/pgpartman/pg_partman
or you can even use scheduler pg_agent where you will execute a procedure every day at say 18:00:00 to create next days partition.

As of Postgres 12, PARTITION BY RANGE is supported.
However, automatic creation of partition (like Oracle's interval) is not supported. You have to manually create each partition.
Also the partition concept in Postgres is different from Oracle. In Oracle partition is considered as an Object and in Postgres, partition is considered as a table. In Postgres, a partitioned table does not itself contain data. It is composed of partitions.
Table creation:
CREATE TABLE TEST (
ID INT NOT NULL,
LOG_DATE DATE)
PARTITION BY RANGE (LOG_DATE);
Partition creation:
CREATE TABLE TEST_MAR21
PARTITION OF TEST
FOR VALUES FROM ('01-MAR-2021') TO ('31-MAR-2021');
CREATE TABLE TEST_APR21
PARTITION OF TEST
FOR VALUES FROM ('01-APR-2021') TO ('30-APR-2021');
See https://www.postgresql.org/docs/current/ddl-partitioning.html for full documentation

Postgres does support partitioning on values. However, it won't be automatic because you will need to manually create the partitions after the base table gets created, as of Postgres 10, they do not automatically get generated.
Please see the following link: https://www.postgresql.org/docs/10/ddl-partitioning.html
See if this example makes sense:
CREATE TABLE PartTest
(
idx INTEGER,
partMe Date
) PARTITION BY LIST (partMe);
CREATE TABLE PartTest_2019_04_11 PARTITION OF PartTest
FOR VALUES IN ('2019-04-11');

Related

Trigger taking time to insert data in postgres (column count 300)

I have created a trigger, it is taking more time while inserting multiple records.
Insetting 1 or 2 records is working. But if the records are more than 1000 then not fast, still running query from 2 hours.
I have created only 15 columns in below table. My actual table has 300 columns.
Is any other way to insert multiple records on the trigger table.?
Table
create table patients (
id serial,
name character varying (50),
daily varchar (8),
month varchar (6),
quarter varchar (6),
registration_date timestamp,
age integer,
address text,
country text,
city text,
phone_number integer,
Education text,
Occupation text,
Marital_Status text,"E-mail" text
);
trigger function
CREATE OR REPLACE FUNCTION update_data_after_insert_data_into_patients()
RETURNS trigger AS
$$BEGIN
update patients t1
set quarter=t2.quarter
from (SELECT (extract(year from registration_date)::text || 'Q' || extract(quarter from registration_date)::text) as quarter,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
update patients t1
set month=t2.month
from (select (extract(year from registration_date)::text || '' || to_char(registration_date,'MM')) as month,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
update patients t1
set daily=t2.daily
from (select extract(year from registration_date) || '' ||to_char(registration_date,'MM') || '' || to_char(registration_date,'DD') as daily,registration_date
from patients) t2 where t1.registration_date =t2.registration_date;
RETURN new;
END;
$$ LANGUAGE plpgsql;
Trigger definition
create TRIGGER trigger_update_data_after_insert_patients
AFTER insert ON patients
FOR EACH ROW
EXECUTE PROCEDURE update_data_after_insert_data_into_patients();
insert multiple records into patients table
INSERT INTO public.patients
("name", daily, "month", quarter, registration_date, age, address, country, city, phone_number, education, occupation, marital_status, "E-mail")
VALUES('Adam', '20221215', '202212', '2022Q4', '2022-08-17 19:01:10-08', 24, '', '', '', 1245578, '', '', '', '');
select statement
select * from patients;
You are updating all rows in the table with the same registration date as the one provided in the insert three times - just to calculate those generated columns.
You can do this more efficiently by assigning the generated values to the NEW record in a BEFORE trigger.
CREATE OR REPLACE FUNCTION update_data_after_insert_data_into_patients()
RETURNS trigger AS
$$
BEGIN
new.quarter := to_char(new.registration_date, 'yyyy"Q"q');
new.month := to_char(new.registration_date, 'yyyy mm');
new.daily := to_char(new.registration_date, 'yyyymmdd');
RETURN new;
END;
$$
LANGUAGE plpgsql;
create TRIGGER trigger_update_data_after_insert_patients
BEFORE insert ON patients
FOR EACH ROW
EXECUTE PROCEDURE update_data_after_insert_data_into_patients();
However I don't see the need to store these calculated values when you can easily format the registration_date when retrieving the data. I would get rid of those columns and the trigger and create a VIEW that does the formatting.

Postgres partitioned table query scans all partitions instead of one

I have a table
CREATE TABLE IF NOT EXISTS prices
(shop_id integer not null,
good_id varchar(24) not null,
eff_date timestamp with time zone not null,
price_wholesale numeric(20,2) not null default 0 constraint chk_price_ws check (price_wholesale >= 0),
price_retail numeric(20,2) not null default 0 constraint chk_price_rtl check (price_retail >= 0),
constraint pk_prices primary key (shop_id, good_id, eff_date)
)partition by list (shop_id);
CREATE TABLE IF NOT EXISTS prices_1 partition of prices for values in (1);
CREATE TABLE IF NOT EXISTS prices_3 partition of prices for values in (2);
CREATE TABLE IF NOT EXISTS prices_4 partition of prices for values in (3);
CREATE TABLE IF NOT EXISTS prices_4 partition of prices for values in (4);
...
CREATE TABLE IF NOT EXISTS prices_6 partition of prices for values in (100);
I'd like to delete outdated prices. The table is huge , so I try to delete small portions of records.
If I use loop and the variable v_shop_id then after 6 times Postgres starts scanning all partitions. I simplified the code, the real code has inner loop by shop_id.
If I use loop without the variable (I explicitly specify the value) Postgres doesn't scan all partitions
here code with the variable
do $$
declare
v_shop_id integer;
v_date_time timestamp with time zone := now();
begin
v_shop_id := 8;
for step in 1..10 loop
delete from prices p
using (select pd.good_id, max(pd.eff_date) as mxef_dt
from prices pd
where pd.eff_date < v_date_time - interval '30 days'
and pd.shop_id = v_shop_id
group by ppd.good_id
having count(1)>1
limit 40000) pfd
where p.eff_date <= pfd.mxef_dt
and p.shop_id = v_shop_id
and p.good_id = pfd.good_id;
end loop;
end;$$LANGUAGE plpgsql
How can I force Postrges to scan one desired partition only?

PostgreSQL Table Partition by Date

I am creating a table that will have over million records. The data will append on a daily basis (once a day) appending previous whole day data. I want to partition the table using daily_updated field. The table will maintain rolling 1 month of data and I am thinking to partition on a 1 day basis. Any suggestions here. How can I maintain a rolling partition. I looked over some example where it states that you write a statement using partition by range and the table will auto-partition based on the data it holds. (statement PARTITION BY RANGE (daily_updated)).
Suggestion please.
thanks,
As note by others, PostgreSQL doesn't have any built in functionality for automatically doing this. I've been using the (slightly edited) following and running it on a daily cron:
CREATE OR REPLACE FUNCTION manage_partitions ()
RETURNS VOID
LANGUAGE plpgsql
SECURITY DEFINER
AS $$
/**
Procedure manage_partitions creates (and disposes of) table partitions
for the SCHEMA_NAME.TABLE_NAME table.
*/
DECLARE
l_schema_name varchar := 'SCHEMA_NAME' ; -- The name of the schema for the partitioned table
l_table_name varchar := 'TABLE_NAME' ; -- The name of the table to manage partitions for
l_retention_days integer := 30 ; -- The number of past days to retain partitions for.
l_pre_days integer := 10 ; -- The number of future days to pre-create partitions for.
-- The intent is to maintain a buffer so that, in the
-- event that this function is not run for a few days,
-- the logging functionality can continue to work.
dt record ;
l_cmd text ;
BEGIN
-- ASSERTION: the schema name and table name of the partioned table do not require quoting
-- ASSERTION: there are no other partitioned tables in the schema that have a similar name
-- ASSERTION: the table partitions reside in the same schema as the parent table
-- NB: in the event that there are ever any partitions that are desired to be preserved
-- beyond the retention schedule then they either need to be renamed in such fashion that
-- breaks the naming pattern or, better yet, manually detached from the parent table
FOR dt IN (
WITH args AS (
SELECT l_retention_days AS retention_days,
l_pre_days AS pre_days,
l_retention_days + l_pre_days AS total_days,
l_schema_name AS schema_name,
l_table_name AS table_name,
l_schema_name || '.' || l_table_name AS parent_table
),
dates AS (
SELECT ( current_date + ( ( s.idx - args.retention_days )::text || ' days'::text )::interval )::date AS partition_date
FROM args
CROSS JOIN (
SELECT idx
FROM generate_series ( 1, ( SELECT total_days FROM args ), 1 ) AS gs ( idx )
) s
),
new_parts AS (
SELECT args.parent_table || '_' || to_char ( dates.partition_date, 'yyyymmdd' ) AS partition_name,
to_char ( dates.partition_date, 'yyyy-mm-dd' ) AS partition_date
FROM dates
CROSS JOIN args
),
cur_parts AS (
SELECT n.nspname || '.' || c.relname AS partition_name
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n
ON ( n.oid = c.relnamespace )
JOIN pg_catalog.pg_inherits i
ON ( c.oid = i.inhrelid )
CROSS JOIN args
WHERE n.nspname = args.schema_name
AND c.relname::text ~ ( args.table_name || '_.+' )::text
)
SELECT args.parent_table,
cur_parts.partition_name AS current_partition,
new_parts.partition_name AS new_partition,
new_parts.partition_date
FROM cur_parts
FULL JOIN new_parts
ON ( cur_parts.partition_name = new_parts.partition_name )
CROSS JOIN args ) LOOP
IF dt.current_partition IS NULL THEN
l_cmd := 'CREATE TABLE '
|| dt.new_partition
|| ' PARTITION OF '
|| dt.parent_table
|| ' FOR VALUES FROM ( '''
|| dt.partition_date
|| '''::date ) TO ( ( '''
|| dt.partition_date
|| '''::date + ''1 day''::interval )::date )' ;
--RAISE NOTICE E' % ;', l_cmd ;
EXECUTE l_cmd ;
ELSIF dt.new_partition IS NULL THEN
l_cmd := 'ALTER TABLE '
|| dt.parent_table
|| ' DETACH PARTITION '
|| dt.current_partition ;
--RAISE NOTICE E' % ;', l_cmd ;
EXECUTE l_cmd ;
l_cmd := 'DROP TABLE ' || dt.current_partition ;
--RAISE NOTICE E' % ;', l_cmd ;
EXECUTE l_cmd ;
END IF ;
END LOOP ;
END ;
$$ ;

Auto-partitioning trigger doesn't work as expected

I'm trying to implement auto-partitioning of a table
CREATE TABLE incoming_ais_messages (
id uuid NOT NULL,
"source" int4 NOT NULL,
ais_channel varchar(8) NOT NULL,
is_read bool NOT NULL,
"time_stamp" timestamptz NOT null,
address_type varchar(32) NOT NULL,
"text" varchar NOT NULL,
CONSTRAINT incoming_ais_messages_pkey PRIMARY KEY (id,time_stamp)
) partition by range ("time_stamp");
For that I use a function:
create or replace function create_partition() returns trigger as $auto_partition$
begin
raise notice 'create_partion called';
execute 'create table if not exists incoming_ais_messages_partition_' || to_char(now()::date, 'yyyy_mm_dd') || ' partition of incoming_ais_messages
for values from (''' || to_char(now()::date, 'yyyy-mm-dd') || ''') to (''' || to_char((now() + interval '1 day')::date, 'yyyy-mm-dd') || ''');';
return new;
end;
$auto_partition$ language plpgsql;
And a trigger that should call it before any inserts:
create trigger auto_partition
before insert on incoming_ais_messages
for each row
execute procedure create_partition();
However when I insert something like:
INSERT INTO incoming_ais_messages (id, "source", ais_channel, is_read, "time_stamp", address_type, "text")
VALUES('123e4567-e89b-12d3-a456-426614174000'::uuid, 0, 'A', false, now(), 'DIRECT', 'text');
I get ther error:
SQL Error [23514]: ERROR: no partition of relation "incoming_ais_messages" found for row
Detail: Partition key of the failing row contains (time_stamp) = (2022-07-21 18:01:41.787604+03).
After that I created the partition manually:
create table if not exists incoming_ais_messages_partition_1970_01_01 partition of incoming_ais_messages
for values from (now()::date) to ((now() + interval '1 day')::date);
executed the same insert statement and got the error:
SQL Error [55006]: ERROR: cannot CREATE TABLE .. PARTITION OF "incoming_ais_messages" because it is being used by active queries in this session
Where: SQL statement "create table if not exists incoming_ais_messages_partition_2022_07_21 partition of incoming_ais_messages
for values from ('2022-07-21') to ('2022-07-22');"
PL/pgSQL function create_partition() line 4 at EXECUTE
Would be great to know what is wrong here. My solution is based on the approach described here https://evilmartians.com/chronicles/a-slice-of-life-table-partitioning-in-postgresql-databases
(Section: Bonus: how to create partitions)
PostgreSQL wants to know which partition the new rows will go into before it calls BEFORE ROW triggers, so the error is thrown before the CREATE gets a chance to run. (Note that the blog example is using a trigger on one table to create partition for a different table).
Doing what you want is possible (timescaledb extension does it, and you could research how if you want), but do yourself a favor and just pre-create a lot of partitions, and add a note to your calendar to add more in the future (as well as dropping old ones). Or write a cron job to do it.

Postgresql, select a "fake" row

In Postgres 8.4 or higher, what is the most efficient way to get a row of data populated by defaults without actually creating the row. Eg, as a transaction (pseudocode):
create table "mytable"
(
id serial PRIMARY KEY NOT NULL,
parent_id integer NOT NULL DEFAULT 1,
random_id integer NOT NULL DEFAULT random(),
)
begin transaction
fake_row = insert into mytable (id) values (0) returning *;
delete from mytable where id=0;
return fake_row;
end transaction
Basically I'd expect a query with a single row where parent_id is 1 and random_id is a random number (or other function return value) but I don't want this record to persist in the table or impact on the primary key sequence serial_id_seq.
My options seem to be using a transaction like above or creating views which are copies of the table with the fake row added but I don't know all the pros and cons of each or whether a better way exists.
I'm looking for an answer that assumes no prior knowledge of the datatypes or default values of any column except id or the number or ordering of the columns. Only the table name will be known and that a record with id 0 should not exist in the table.
In the past I created the fake record 0 as a permanent record but I've come to consider this record a type of pollution (since I typically have to filter it out of future queries).
You can copy the table definition and defaults to the temp table with:
CREATE TEMP TABLE table_name_rt (LIKE table_name INCLUDING DEFAULTS);
And use this temp table to generate dummy rows. Such table will be dropped at the end of the session (or transaction) and will only be visible to current session.
You can query the catalog and build a dynamic query
Say we have this table:
create table test10(
id serial primary key,
first_name varchar( 100 ),
last_name varchar( 100 ) default 'Tom',
age int not null default 38,
salary float default 100.22
);
When you run following query:
SELECT string_agg( txt, ' ' order by id )
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, -9999 || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
you will get this sting as a result:
"SELECT -9999 as id , null::character varying as first_name ,
'Tom'::character varying as last_name , 38 as age , 100.22 as salary"
then execute this query and you will get the "phantom row".
We can build a function that build and excecutes the query and return our row as a result:
CREATE OR REPLACE FUNCTION get_phantom_rec (p_i test10.id%type )
returns test10 as $$
DECLARE
v_sql text;
myrow test10%rowtype;
begin
SELECT string_agg( txt, ' ' order by id )
INTO v_sql
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, p_i || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
EXECUTE v_sql INTO myrow;
RETURN myrow;
END$$ LANGUAGE plpgsql ;
and then this simple query gives you what you want:
select * from get_phantom_rec ( -9999 );
id | first_name | last_name | age | salary
-------+------------+-----------+-----+--------
-9999 | | Tom | 38 | 100.22
I would just select the fake values as literals:
select 1 id, 1 parent_id, 1 user_id
The returned row will be (virtually) indistinguishable from a real row.
To get the values from the catalog:
select
0 as id, -- special case for serial type, just return 0
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'parent_id') as parent_id,
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'user_id') as user_id;
Note that you must know what the columns are and their type, but this is reasonable. If you change the table schema (except default value), you would need to tweak the query.
See the above as a SQLFiddle.