Postgres partitioned table query scans all partitions instead of one - postgresql

I have a table
CREATE TABLE IF NOT EXISTS prices
(shop_id integer not null,
good_id varchar(24) not null,
eff_date timestamp with time zone not null,
price_wholesale numeric(20,2) not null default 0 constraint chk_price_ws check (price_wholesale >= 0),
price_retail numeric(20,2) not null default 0 constraint chk_price_rtl check (price_retail >= 0),
constraint pk_prices primary key (shop_id, good_id, eff_date)
)partition by list (shop_id);
CREATE TABLE IF NOT EXISTS prices_1 partition of prices for values in (1);
CREATE TABLE IF NOT EXISTS prices_3 partition of prices for values in (2);
CREATE TABLE IF NOT EXISTS prices_4 partition of prices for values in (3);
CREATE TABLE IF NOT EXISTS prices_4 partition of prices for values in (4);
...
CREATE TABLE IF NOT EXISTS prices_6 partition of prices for values in (100);
I'd like to delete outdated prices. The table is huge , so I try to delete small portions of records.
If I use loop and the variable v_shop_id then after 6 times Postgres starts scanning all partitions. I simplified the code, the real code has inner loop by shop_id.
If I use loop without the variable (I explicitly specify the value) Postgres doesn't scan all partitions
here code with the variable
do $$
declare
v_shop_id integer;
v_date_time timestamp with time zone := now();
begin
v_shop_id := 8;
for step in 1..10 loop
delete from prices p
using (select pd.good_id, max(pd.eff_date) as mxef_dt
from prices pd
where pd.eff_date < v_date_time - interval '30 days'
and pd.shop_id = v_shop_id
group by ppd.good_id
having count(1)>1
limit 40000) pfd
where p.eff_date <= pfd.mxef_dt
and p.shop_id = v_shop_id
and p.good_id = pfd.good_id;
end loop;
end;$$LANGUAGE plpgsql
How can I force Postrges to scan one desired partition only?

Related

Get row number of row to be inserted in Postgres trigger that gives no collisions when inserting multiple rows

Given the following (simplified) schema:
CREATE TABLE period (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
name TEXT
);
CREATE TABLE course (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
name TEXT
);
CREATE TABLE registration (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
period_id UUID NOT NULL REFERENCES period(id),
course_id UUID NOT NULL REFERENCES course(id),
inserted_at timestamptz NOT NULL DEFAULT now()
);
I now want to add a new column client_ref, which identifies a registration unique within a period, but consists of only a 4-character string. I want to use pg_hashids - which requires a unique integer input - to base the column value on.
I was thinking of setting up a trigger on the registration table that runs on inserting a new row. I came up with the following:
CREATE OR REPLACE FUNCTION set_client_ref()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE
next_row_number integer;
BEGIN
WITH rank AS (
SELECT
period.id AS period_id,
row_number() OVER (PARTITION BY period.id ORDER BY registration.inserted_at)
FROM
registration
JOIN period ON registration.period_id = period.id ORDER BY
period.id,
row_number
)
SELECT
COALESCE(rank.row_number, 0) + 1 INTO next_row_number
FROM
period
LEFT JOIN rank ON (rank.period_id = period.id)
WHERE
period.id = NEW.period_id
ORDER BY
rank.row_number DESC
LIMIT 1;
NEW.client_ref = id_encode (next_row_number);
RETURN NEW;
END
$function$
;
The trigger is set-up like: CREATE TRIGGER set_client_ref BEFORE INSERT ON registration FOR EACH ROW EXECUTE FUNCTION set_client_ref();
This works as expected when inserting a single row to registration, but if I insert multiple within one statement, they end up having the same client_ref. I can reason about why this happens (the rows don't know about each other's existence, so they assume they're all just next in line when retrieving their row_order), but I am not sure what a way is to prevent this. I tried setting up the trigger as an AFTER trigger, but it resulted in the same (duplicated) behaviour.
What would be a better way to get the lowest possible, unique integer for the rows to be inserted (to base the hash function on) that also works when inserting multiple rows?

Weighted Random Selection

Please. I have two tables with the most common first and last names. Each table has basically two fields:
Tables
CREATE TABLE "common_first_name" (
"first_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
CREATE TABLE "common_last_name" (
"last_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
P.S: The TOP 1 name occurs only ~ 1.8% of the time. The tables have 1000 rows each.
Function (Pseudo, not READY)
CREATE OR REPLACE FUNCTION create_sample_data(p_number_of_records INT)
RETURNS VOID
AS $$
DECLARE
SUM_OF_WEIGHTS CONSTANT INT := 100;
BEGIN
FOR i IN 1..coalesce(p_number_of_records, 0) LOOP
--Get the random first and last name but taking in consideration their probability (RATIO)round(random()*SUM_OF_WEIGHTS);
--create_person (random_first_name || ' ' || random_last_name);
END LOOP;
END
$$
LANGUAGE plpgsql VOLATILE;
P.S.: The sum of all ratios for each name (per table) sums up to 100%.
I want to run a function N times and get a name and a surname to create sample data... both tables have 1000 rows each.
The sample size can be anywhere from 1000 full names to 1000000 names, so if there is a "fast" way of doing this random weighted function, even better.
Any suggestion of how to do it in PL/PGSQL?
I am using PG 13.3 on SUPABASE.IO.
Thanks
Given the small input dataset, it's straightforward to do this in pure SQL. Use CTEs to build lower & upper bound columns for each row in each of the common_FOO_name tables, then use generate_series() to generate sets of random numbers. Join everything together, and use the random value between the bounds as the WHERE clause.
with first_names_weighted as (
select first_name,
sum(ratio) over (order by first_name) - ratio as lower_bound,
sum(ratio) over (order by first_name) as upper_bound
from common_first_name
),
last_names_weighted as (
select last_name,
sum(ratio) over (order by last_name) - ratio as lower_bound,
sum(ratio) over (order by last_name) as upper_bound
from common_last_name
),
randoms as (
select random() * (select sum(ratio) from common_first_name) as f_random,
random() * (select sum(ratio) from common_last_name) as l_random
from generate_series(1, 32)
)
select r, first_name, last_name
from randoms r
cross join first_names_weighted f
cross join last_names_weighted l
where f.lower_bound <= r.f_random and r.f_random <= f.upper_bound
and l.lower_bound <= r.l_random and r.l_random <= l.upper_bound;
Change the value passed to generate_series() to control how many names to generate. If it's important that it be a function, you can just use a LANGAUGE SQL function definition to parameterize that number:
https://www.db-fiddle.com/f/mmGQRhCP2W1yfhZTm1yXu5/3

How can I update TABLE1 rows when I change some TABLE2 rows in POSTGRESQL?

I am building a soccer management tool where the league's admin can update the score of every match in the MATCHES TABLE. At the same time I want to update the TEAMS TABLE columns.
For instance if the match is DALLAS vs PHOENIX, and the score was DALLAS 2 - PHOENIX 3, I want to update that match in the MATCH TABLE (I know how to tho this) but at the same time I want to update the points of those two teams based on the result we just updated.
Is there a way to do that in POSTGRESQL?
Thanks for your help.
You can do this for triggers. What is a Database trigger? A database trigger is a special stored procedure that is run when specific actions occur within a database. Most triggers are defined to run when changes are made to a table’s data. Triggers can be defined to run after (or before) INSERT, UPDATE, and DELETE table records. Triggers use two special database objects, INSERTED and DELETED, to access rows affected by the database actions.
When table record is inserted – Use the INSERTED table to determine which rows were added to the table.
When table record is deleted – Use the DELETED table to see which rows were removed from the table.
When table record is updated – Use the INSERTED table to inspect the new or updated values and the DELETED table to see the values prior to update.
In PostgreSQL INSERTED trigger object is called NEW and DELETED object is called OLD
For example:
We have two tables, user_group and user_detail. I would like to insert 12 records into table user_detail when inserting data to table user_group
CREATE TABLE examples.user_group (
id serial4 NOT NULL,
group_name varchar(200) NOT NULL,
user_id int4 NOT NULL
);
CREATE TABLE examples.user_detail (
id serial4 NOT NULL,
user_id int4 NOT NULL,
"month" int2 NOT NULL
);
-- create trigger function for inserting 12 records into user_detail table
CREATE OR REPLACE FUNCTION examples.f_user_group_after_insert()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE
p_user_id integer;
begin
p_user_id := new.user_id; -- new is a system table (trigger objects), which return inserted new records for user_group tables
insert into examples.user_detail (user_id, month) values (p_user_id, 1);
insert into examples.user_detail (user_id, month) values (p_user_id, 2);
insert into examples.user_detail (user_id, month) values (p_user_id, 3);
insert into examples.user_detail (user_id, month) values (p_user_id, 4);
insert into examples.user_detail (user_id, month) values (p_user_id, 5);
insert into examples.user_detail (user_id, month) values (p_user_id, 6);
insert into examples.user_detail (user_id, month) values (p_user_id, 7);
insert into examples.user_detail (user_id, month) values (p_user_id, 8);
insert into examples.user_detail (user_id, month) values (p_user_id, 9);
insert into examples.user_detail (user_id, month) values (p_user_id, 10);
insert into examples.user_detail (user_id, month) values (p_user_id, 11);
insert into examples.user_detail (user_id, month) values (p_user_id, 12);
return new;
end;
$function$
;
-- join trigger function to user_group table, when will be run after insert
create trigger user_group_after_insert
after insert
on
examples.user_group for each row execute function examples.f_user_group_after_insert();

Postgres - fill in missing data in new table

Given two tables, A and B:
A B
----- -----
id id
high high
low low
bId
I want to find rows in table A where bId is null, create an entry in B based off the data in A, and update the row in A to reference the newly created row. I can create the rows but I'm having trouble updating table A with the reference to the new row:
begin transaction;
with rows as (
insert into B (high, low)
select high, low
from A a
where a.bId is null
returning id as bId, a.id as aId
)
update A
set bId=(select bId from rows where id=rows.aId)
where id=rows.aId;
--commit;
rollback;
However, this fails with a cryptic error: ERROR: missing FROM-clause entry for table a.
Using a Postgres query, how can I achieve this?
either
update "A"
set "bId"=(select "bId" from rows where id=rows."aId")
without the where clause or
update "A"
set "bId"=(select "bId" from rows where id=rows."aId")
FROM rows
where "A".id=rows.aId;
I dont know if your tables realy have that names, as mentioned in the comments try to avoid uppercase tables and fieldnames and try to avoid reserved keynames.
I found a way to get it to work but I feel like it's not the most efficient.
begin transaction;
do $body$
declare
newId int4;
tempB record;
begin
create temp table TempAB (
High float8,
Low float8,
AID int4
);
insert into TempAB (High, Low, AId)
select high, low, id
from A
where bId is null;
for tempB in (select * from TempAB)
loop
insert into B (high, low)
values (tempB.high, tempB.low)
returning id into newId;
update A
set bId=newId
where id=tempB.AId;
end loop;
end $body$;
rollback;
--commit;

Does postgres postgis ST_makeline have a max number of points it can create a line from?

My database has a table with tons of geometry(PointZ,4326) I am doing a lot of my processing on the database side and I've noticed that when I use the ST_MakeLine I seem to be hitting a cap on the number of points it will make a line from. My table and function/query is below.
It works as long as the number of track_points returned from the sub query is less than 97. I know this because the insert puts data in the table for all columns when there are 96 points or fewer. For all records where there are 97 or more points all it inserts is the track_id, start_time and end_time.
I'm wondering if this is a bug in the ST_makeLine function of postgis or is it a setting in postgres that I need to modify.
CREATE TABLE track_line_strings(
track_id bigint NOT NULL,
linestring geometry(LINESTRINGZ,4326),
start_time bigint NOT NULL,
end_time bigint NOT NULL,
CONSTRAINT track_line_strings_pk PRIMARY KEY (track_id)
);
CREATE OR REPLACE FUNCTION create_track_line_string() RETURNS trigger
LANGUAGE plpgsql
AS $$
DECLARE
TRACKITEMID bigint := new.track_item_id;
TRACKID bigint := track_id from track_item ti where ti.id = TRACKITEMID;
STARTTIME bigint := MIN(ti.item_time) from track_item ti where ti.track_id = TRACKID;
ENDTIME bigint := MAX(ti.item_time) from track_item ti where ti.track_id = TRACKID;
BEGIN
IF EXISTS (SELECT track_id from track_line_strings where track_id = TRACKID)
THEN
UPDATE track_line_strings
SET start_time = STARTTIME, end_time = ENDTIME, linestring = (
SELECT ST_Makeline(e.trackPosition) FROM
(
Select track_id, tp.track_position AS trackPosition
FROM track_point tp JOIN track_item ti ON tp.track_item_id = ti.id
where ti.track_id = TRACKID ORDER BY ti.item_time ASC
) E )
WHERE track_id = TRACKID;
ELSE
INSERT INTO track_line_strings(track_id, linestring, start_time, end_time)
SELECT TRACKID, ST_Makeline(e.trackPosition), STARTTIME, ENDTIME FROM
(
Select track_id, tp.track_position AS trackPosition
FROM track_point tp JOIN track_item ti ON tp.track_item_id = ti.id
where ti.track_id = TRACKID ORDER BY ti.item_time ASC
)E;
END IF;
RETURN new;
END;
$$;
The database limits are pretty high, 1 GB data worth of geometry data in a field. It depends on what kind of point geometry, but it will be on the order of tens of millions of point geometries that can be used to construct a LineString.
You will see a proper error message with something about "exceeded size" if it is a limitation.
Apparent empty or missing data with pgAdminIII is a common question, but not related to database limitations:
http://postgis.net/2013/10/05/tip_pgAdmin_shows_no_data
http://postgis.net/docs/manual-dev/PostGIS_FAQ.html#pgadmin_shows_no_data_in_geom
There doesnt appear to be a limit. I was viewing results in pgAdminIII and there must be a limit on the number of characters the data output can handle for each column. I only realized this by copy pasting the results into a text file to see that it did infact return a value for the lines that have more than 96 points.