Postgresql not choosing rows grouping

Postgresql not choosing rows grouping - postgresql

I have query. There is a construction like this example: (online demo)
You will see the in result created_at field. I have to use query the created_at field. So I have to use it in select created_at. I don't want to use it created_at field in select. Because, there are millions of records in the deposits table. How can i escape this problem?
(Note: I have many table to query, like "deposits" table. this is just a short example.)
create table payment_methods
(
payment_method_id bigserial not null
constraint payment_methods_pkey
primary key
);
create table currencies_of_payment_methods
(
copm_id bigserial not null
constraint currencies_of_payment_methods_pkey
primary key,
payment_method_id integer not null
);
create table deposits
(
deposit_id bigserial not null
constraint deposits_pkey
primary key,
amount numeric(18,2) not null,
copm_id integer not null,
created_at timestamp(0)
);
INSERT INTO payment_methods (payment_method_id) VALUES (1);
INSERT INTO payment_methods (payment_method_id) VALUES (2);
INSERT INTO currencies_of_payment_methods (copm_id, payment_method_id) VALUES (1, 1);
INSERT INTO deposits (amount, copm_id, created_at) VALUES (100, 1, '2020-09-10 08:49:37');
INSERT INTO deposits (amount, copm_id, created_at) VALUES (200, 1, '2020-09-10 08:49:37');
INSERT INTO deposits (amount, copm_id, created_at) VALUES (40, 1, '2020-09-10 08:49:37');
Query:
SELECT payment_methods.payment_method_id,
deposit_copm_id.deposit_copm_id,
manuel_deposit_amount.manuel_deposit_amount,
manuel_deposit_amount.created_at
FROM payment_methods
CROSS JOIN lateral
(
SELECT currencies_of_payment_methods.copm_id AS deposit_copm_id
FROM currencies_of_payment_methods
WHERE currencies_of_payment_methods.payment_method_id = payment_methods.payment_method_id) deposit_copm_id
CROSS JOIN lateral
(
SELECT sum(deposits.amount) AS manuel_deposit_amount,
array_agg(deposits.created_at) AS created_at
FROM deposits
WHERE deposits.copm_id = deposit_copm_id.deposit_copm_id) manuel_deposit_amount
WHERE payment_methods.payment_method_id = 1

Related

How to pull out records based on array of values

Suppose the following structure:
CREATE SCHEMA IF NOT EXISTS my_schema;
CREATE TABLE IF NOT EXISTS my_schema.user (
id SERIAL PRIMARY KEY,
tag_id BIGINT NOT NULL
);
CREATE TABLE IF NOT EXISTS my_schema.conversation (
id SERIAL PRIMARY KEY,
user_ids BIGINT[] NOT NULL
);
INSERT INTO my_schema.user VALUES
(1, 55555),
(2, 77777);
INSERT INTO my_schema.conversation VALUES
(1, '{1,2}');
I can pull out the my_schema.conversation records if I know the my_schema.user.id values:
SELECT *
FROM my_schema.conversation
WHERE user_ids #> '{1}'
The above works, but I need to use my_schema.user.tag_id instead of my_schema.user.id:
How can I do this?
Fiddle

You would have to join the two tables on the array values
SELECT *
FROM my_schema.user u
JOIN my_schema.conversation c
ON u.id = any(c.chat_ids)
WHERE u.tag_id=55555;

Using 'on conflict' with a unique constraint on a table partitioned by date

Given the following table:
CREATE TABLE event_partitioned (
customer_id varchar(50) NOT NULL,
user_id varchar(50) NOT NULL,
event_id varchar(50) NOT NULL,
comment varchar(50) NOT NULL,
event_timestamp timestamp with time zone DEFAULT NOW()
)
PARTITION BY RANGE (event_timestamp);
And partitioning by calendar week [one example]:
CREATE TABLE event_partitioned_2020_51 PARTITION OF event_partitioned
FOR VALUES FROM ('2020-12-14') TO ('2020-12-20');
And the unique constraint [event_timestamp necessary since the partition key]:
ALTER TABLE event_partitioned
ADD UNIQUE (customer_id, user_id, event_id, event_timestamp);
I would like to update if customer_id, user_id, event_id exist, otherwise insert:
INSERT INTO event_partitioned (customer_id, user_id, event_id)
VALUES ('9', '99', '999')
ON CONFLICT (customer_id, user_id, event_id, event_timestamp) DO UPDATE
SET comment = 'I got updated';
But I cannot add a unique constraint only for customer_id, user_id, event_id, hence event_timestamp as well.
So this will insert duplicates of customer_id, user_id, event_id. Even so with adding now() as a fourth value, unless now() precisely matches what's already in event_timestamp.
Is there a way that ON CONFLICT could be less 'granular' here and update if now() falls in the week of the partition, rather than precisely on '2020-12-14 09:13:04.543256' for example?
Basically I am trying to avoid duplication of customer_id, user_id, event_id, at least within a week, but still benefit from partitioning by week (so that data retrieval can be narrowed to a date range and not scan the entire partitioned table).

I don't think you can do this with on conflict in a partitioned table. You can, however, express the logic with CTEs:
with
data as ( -- data
select '9' as customer_id, '99' as user_id, '999' as event_id
),
ins as ( -- insert if not exists
insert into event_partitioned (customer_id, user_id, event_id)
select * from data d
where not exists (
select 1
from event_partitioned ep
where
ep.customer_id = d.customer_id
and ep.user_id = d.user_id
and ep.event_id = d.event_id
)
returning *
)
update event_partitioned ep -- update if insert did not happen
set comment = 'I got updated'
from data d
where
ep.customer_id = d.customer_id
and ep.user_id = d.user_id
and ep.event_id = d.event_id
and not exists (select 1 from ins)

#GMB's answer is great and works well. Since enforcing a unique constrain on a partitioned table (parent table) partitioned by time range is usually not that useful, why now just have a unique constraint/index placed on the partition itself?
In your case, event_partitioned_2020_51 can have a unique constraint:
ALTER TABLE event_partitioned_2020_51
ADD UNIQUE (customer_id, user_id, event_id, event_timestamp);
And subsequent query can just use
INSERT ... INTO event_partitioned_2020_51 ON CONFLICT (customer_id, user_id, event_id, event_timestamp)
as long as this its the partition intended, which is usually the case.

Postgres exclude using gist across different tables

I have 2 tables like this
drop table if exists public.table_1;
drop table if exists public.table_2;
CREATE TABLE public.table_1 (
id serial NOT NULL,
user_id bigint not null,
status varchar(255) not null,
date_start date NOT NULL,
date_end date NULL
);
CREATE TABLE public.table_2 (
id serial NOT NULL,
user_id bigint not null,
status varchar(255) not null,
date_start date NOT NULL,
date_end date NULL
);
alter table public.table_1
add constraint my_constraint_1
EXCLUDE USING gist (user_id with =, daterange(date_start, date_end, '[]') WITH &&)
where (status != 'deleted');
alter table public.table_2
add constraint my_constraint_2
EXCLUDE USING gist (user_id with =, daterange(date_start, date_end, '[]') WITH &&)
where (status != 'deleted');
Every table contains rows which are related to a user, and all the rows of the same user cannot overlap in range. In addition, some rows may be logically deleted, so I added a where condition.
So far it's working w/o problems, but the 2 constraints work separately for each table.
I need to create a constraint which cover the 2 set of tables, so that a single daterange (of the same user and not deleted), may appaer only once across the 2 different tables.
Does the EXCLUDE notation be extended to work with different tables or do I need to check it with a trigger? If the trigger is the answer, which is the simplier way to do this? Create a temporary table with the union of the 2, add the constraint on it and check if fails?

Starting from #Laurenz Albe suggestion, this is what I made
-- #################### SETUP SAMPLE TABLES ####################
drop table if exists public.table_1;
drop table if exists public.table_2;
CREATE TABLE public.table_1 (
id serial NOT NULL,
user_id bigint not null,
status varchar(255) not null,
date_start date NOT NULL,
date_end date NULL
);
CREATE TABLE public.table_2 (
id serial NOT NULL,
user_id bigint not null,
status varchar(255) not null,
date_start date NOT NULL,
date_end date NULL
);
alter table public.table_1
add constraint my_constraint_1
EXCLUDE USING gist (user_id with =, daterange(date_start, date_end, '[]') WITH &&)
where (status != 'deleted');
alter table public.table_2
add constraint my_constraint_2
EXCLUDE USING gist (user_id with =, daterange(date_start, date_end, '[]') WITH &&)
where (status != 'deleted');
-- #################### SETUP TRIGGER ####################
create or REPLACE FUNCTION check_date_overlap_trigger_hook()
RETURNS trigger as
$body$
DECLARE
l_table text;
l_sql text;
l_row record;
begin
l_table := TG_ARGV[0];
l_sql := format('
select *
from public.%s as t
where
t.user_id = %s -- Include only records of the same user
and t.status != ''deleted'' -- Include only records that are active
', l_table, new.user_id);
for l_row in execute l_sql
loop
IF daterange(l_row.date_start, COALESCE(l_row.date_end, 'infinity'::date)) && daterange(new.date_start, COALESCE(new.date_end, 'infinity'::date))
THEN
RAISE EXCEPTION 'Date interval is overlapping with another one in table %', l_table
USING HINT = 'You can''t have the same interval across table1 AND table2';
END IF;
end loop;
RETURN NEW;
end
$body$
LANGUAGE plpgsql;
-- #################### INSTALL TRIGGER ####################
create trigger check_date_overlap
BEFORE insert or update
ON public.table_1
FOR EACH row
EXECUTE PROCEDURE check_date_overlap_trigger_hook('table_2');
create trigger check_date_overlap
BEFORE insert or update
ON public.table_2
FOR EACH row
EXECUTE PROCEDURE check_date_overlap_trigger_hook('table_1');
-- #################### INSERT DEMO ROWS ####################
insert into public.table_1 (user_id, status, date_start, date_end) values (1, 'active', '2020-12-10', '2020-12-20');
insert into public.table_1 (user_id, status, date_start, date_end) values (1, 'deleted', '2020-12-15', '2020-12-25');
insert into public.table_1 (user_id, status, date_start, date_end) values (2, 'active', '2020-12-10', '2020-12-20');
insert into public.table_1 (user_id, status, date_start, date_end) values (2, 'deleted', '2020-12-15', '2020-12-25');
-- This will fail for overlap on the same table
-- insert into public.table_1 (user_id, status, date_start, date_end) values (1, 'active', '2020-12-15', '2020-12-25');
-- This will fail as the user 1 already has an overlapping period on table 1
-- insert into public.table_2 (user_id, status, date_start, date_end) values (1, 'active', '2020-12-15', '2020-12-25');
-- This will fail as the user 1 already has an overlapping period on table 1
insert into public.table_2 (user_id, status, date_start, date_end) values (1, 'deleted', '2020-12-15', '2020-12-25');
update public.table_2 set status = 'active' where id = 1;
select 'table_1' as src_table, * from public.table_1
union
select 'table_2', * from public.table_2

You can probably use a trigger, but triggers are always vulnerable to race conditions (unless you are using SERIALIZABLE isolation).
If your tables really have the same columns, why don't you use a single table (and perhaps add a type column to disambiguate)?

Inner join request to get the items available for the user

I have four tables:
CREATE TABLE t_users (
user_id varchar PRIMARY KEY,
user_email varchar
);
CREATE TABLE t_items (
item_id varchar PRIMARY KEY,
owner_id varchar not null references t_users(user_id),
title varchar
);
CREATE TABLE t_access_gropes (
access_group_id varchar PRIMARY KEY,
user_id varchar not null references t_users(user_id)
);
CREATE TABLE t_access_sets (
access_set_id varchar PRIMARY KEY,
item_id varchar not null references t_items(item_id),
access_group_id varchar not null references t_access_gropes(access_group_id)
);
With data:
INSERT INTO t_users VALUES ('us123', 'us123#email.com');
INSERT INTO t_users VALUES ('us456', 'us456#email.com');
INSERT INTO t_users VALUES ('us789', 'us789#email.com');
INSERT INTO t_items VALUES ('it123', 'us123', 'title1');
INSERT INTO t_items VALUES ('it456', 'us456', 'title2');
INSERT INTO t_items VALUES ('it678', 'us789', 'title3');
INSERT INTO t_items VALUES ('it323', 'us123', 'title4');
INSERT INTO t_items VALUES ('it764', 'us456', 'title5');
INSERT INTO t_items VALUES ('it826', 'us789', 'title6');
INSERT INTO t_items VALUES ('it568', 'us123', 'title7');
INSERT INTO t_items VALUES ('it038', 'us456', 'title8');
INSERT INTO t_items VALUES ('it728', 'us789', 'title9');
INSERT INTO t_access_gropes VALUES ('ag123', 'us123');
INSERT INTO t_access_gropes VALUES ('ag456', 'us456');
INSERT INTO t_access_gropes VALUES ('ag789', 'us789');
INSERT INTO t_access_sets VALUES ('as123', 'it123', 'ag123');
INSERT INTO t_access_sets VALUES ('as456', 'it456', 'ag123');
The t_access_gropes forms groups of users.
The t_access_sets forms security kits.
How to make a request to get the all items available for the user. Something like:
select *
from t_items
inner join t_users on t_items.owner_id = t_users.user_id
inner join t_access_gropes on t_users.user_id = t_access_gropes.user_id
inner join t_access_sets on t_items.item_id = t_access_sets.item_id
where t_access_gropes.user_id = 'us123';
Thank you.

select u.user_id, u.email, i.item_id, i.title
from t_users u
join t_items i on i.owner_id = u.user_id
where u.user_id = 'us123'
I believe this is what you want for your exact request!
Otherwise what you wrote is fine however I dont see the relevance in the tables you joined together as you directly connected your users table and items table together, which is why you wouldnt need to join the other two tables (groups & sets). Usually in some cases you find a table of user_items inbetween the users and items table.

Selecting one specific data row (required), and 3 others (specific data row must be included)

I need to select a specific row and 2 other rows that is not that specific row (a total of 3). The specific row must always be included in the 3 results. How should I go about it? I think it can be done with a UNION ALL, but do I have another choice? Thanks all! :)
Here are my scripts to create the sample tables:
create table users (
user_id serial primary key,
user_name varchar(20) not null
);
create table result_table1 (
result_id serial primary key,
user_id int4 references users(user_id),
result_1 int4 not null
);
create table result_table2 (
result_id serial primary key,
user_id int4 references users(user_id),
result_2 int4 not null
);
insert into users (user_name) values ('Kevin'),('John'),('Batman'),('Someguy');
insert into result_table1 (user_id, result_1) values (1, 20),(2, 40),(3, 70),(4, 42);
insert into result_table2 (user_id, result_2) values (1, 4),(2, 3),(3, 7),(4, 5);
Here is my UNION query:
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id = 1
UNION ALL
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id != 1
LIMIT 3;
Are there any options other than a UNION? The query works and does what I want for now, but will it always include user_id = 1 if I had a larger set of rows (assume that user_id = 1 will always be there)? :(
Thank you all! :)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgresql not choosing rows grouping - postgresql

Related

How to pull out records based on array of values

Using 'on conflict' with a unique constraint on a table partitioned by date

Postgres exclude using gist across different tables

Inner join request to get the items available for the user

Selecting one specific data row (required), and 3 others (specific data row must be included)

Categories

Resources