Selecting one specific data row (required), and 3 others (specific data row must be included) - postgresql

I need to select a specific row and 2 other rows that is not that specific row (a total of 3). The specific row must always be included in the 3 results. How should I go about it? I think it can be done with a UNION ALL, but do I have another choice? Thanks all! :)
Here are my scripts to create the sample tables:
create table users (
user_id serial primary key,
user_name varchar(20) not null
);
create table result_table1 (
result_id serial primary key,
user_id int4 references users(user_id),
result_1 int4 not null
);
create table result_table2 (
result_id serial primary key,
user_id int4 references users(user_id),
result_2 int4 not null
);
insert into users (user_name) values ('Kevin'),('John'),('Batman'),('Someguy');
insert into result_table1 (user_id, result_1) values (1, 20),(2, 40),(3, 70),(4, 42);
insert into result_table2 (user_id, result_2) values (1, 4),(2, 3),(3, 7),(4, 5);
Here is my UNION query:
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id = 1
UNION ALL
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id != 1
LIMIT 3;
Are there any options other than a UNION? The query works and does what I want for now, but will it always include user_id = 1 if I had a larger set of rows (assume that user_id = 1 will always be there)? :(
Thank you all! :)

Related

How to pull out records based on array of values

Suppose the following structure:
CREATE SCHEMA IF NOT EXISTS my_schema;
CREATE TABLE IF NOT EXISTS my_schema.user (
id SERIAL PRIMARY KEY,
tag_id BIGINT NOT NULL
);
CREATE TABLE IF NOT EXISTS my_schema.conversation (
id SERIAL PRIMARY KEY,
user_ids BIGINT[] NOT NULL
);
INSERT INTO my_schema.user VALUES
(1, 55555),
(2, 77777);
INSERT INTO my_schema.conversation VALUES
(1, '{1,2}');
I can pull out the my_schema.conversation records if I know the my_schema.user.id values:
SELECT *
FROM my_schema.conversation
WHERE user_ids #> '{1}'
The above works, but I need to use my_schema.user.tag_id instead of my_schema.user.id:
How can I do this?
Fiddle
You would have to join the two tables on the array values
SELECT *
FROM my_schema.user u
JOIN my_schema.conversation c
ON u.id = any(c.chat_ids)
WHERE u.tag_id=55555;

Is it possible to find duplicating records in two columns simultaneously in PostgreSQL?

I have the following database schema (oversimplified):
create sequence partners_partner_id_seq;
create table partners
(
partner_id integer default nextval('partners_partner_id_seq'::regclass) not null primary key,
name varchar(255) default NULL::character varying,
company_id varchar(20) default NULL::character varying,
vat_id varchar(50) default NULL::character varying,
is_deleted boolean default false not null
);
INSERT INTO partners(name, company_id, vat_id) VALUES('test1','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test2','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test3','3214567890102', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test4','9999999999999', 'GE9999999999999');
I am trying to figure out how to return test1, test2 (because the company_id column value duplicates vertically) and test3 (because the vat_id column value duplicates vertically as well).
To put it in other words - I need to find duplicating company_id and vat_id records and group them together, so that test1, test2 and test3 would be together, because they duplicate by company_id and vat_id.
So far I have the following query:
SELECT *
FROM (
SELECT *, LEAD(row, 1) OVER () AS nextrow
FROM (
SELECT *, ROW_NUMBER() OVER (w) AS row
FROM partners
WHERE is_deleted = false
AND ((company_id != '' AND company_id IS NOT null) OR (vat_id != '' AND vat_id IS NOT NULL))
WINDOW w AS (PARTITION BY company_id, vat_id ORDER BY partner_id DESC)
) x
) y
WHERE (row > 1 OR nextrow > 1)
AND is_deleted = false
This successfully shows all company_id duplicates, but does not appear to show vat_id ones - test3 row is missing. Is this possible to be done within one query?
Here is a db-fiddle with the schema, data and predefined query reproducing my result.
You can do this with recursion, but depending on the size of your data you may want to iterate, instead.
The trick is to make the name just another match key instead of treating it differently than the company_id and vat_id:
create table partners (
partner_id integer generated always as identity primary key,
name text,
company_id text,
vat_id text,
is_deleted boolean not null default false
);
insert into partners (name, company_id, vat_id) values
('test1','1010109191191', 'BG1010109191192'),
('test2','1010109191191', 'BG1010109191192'),
('test3','3214567890102', 'BG1010109191192'),
('test4','9999999999999', 'GE9999999999999'),
('test5','3214567890102', 'BG8888888888888'),
('test6','2983489023408', 'BG8888888888888')
;
I added a couple of test cases and left in the lone partner.
with recursive keys as (
select partner_id,
array['n_'||name, 'c_'||company_id, 'v_'||vat_id] as matcher,
array[partner_id] as matchlist,
1 as size
from partners
), matchers as (
select *
from keys
union all
select p.partner_id, c.matcher,
p.matchlist||c.partner_id as matchlist,
p.size + 1
from matchers p
join keys c
on c.matcher && p.matcher
and not p.matchlist #> array[c.partner_id]
), largest as (
select distinct sort(matchlist) as matchlist
from matchers m
where not exists (select 1
from matchers
where matchlist #> m.matchlist
and size > m.size)
-- and size > 1
)
select *
from largest
;
matchlist
{1,2,3,5,6}
{4}
fiddle
EDIT UPDATE
Since recursion did not perform, here is an iterative example in plpgsql that uses a temporary table:
create temporary table match1 (
partner_id int not null,
group_id int not null,
matchkey uuid not null
);
create index on match1 (matchkey);
create index on match1 (group_id);
insert into match1
select partner_id, partner_id, md5('n_'||name)::uuid from partners
union all
select partner_id, partner_id, md5('c_'||company_id)::uuid from partners
union all
select partner_id, partner_id, md5('v_'||vat_id)::uuid from partners;
do $$
declare _cnt bigint;
begin
loop
with consolidate as (
select group_id,
min(group_id) over (partition by matchkey) as new_group_id
from match1
), minimize as (
select group_id, min(new_group_id) as new_group_id
from consolidate
group by group_id
), doupdate as (
update match1
set group_id = m.new_group_id
from minimize m
where m.group_id = match1.group_id
and m.new_group_id != match1.group_id
returning *
)
select count(*) into _cnt from doupdate;
if _cnt = 0 then
exit;
end if;
end loop;
end;
$$;
updated fiddle

Postgresql not choosing rows grouping

I have query. There is a construction like this example: (online demo)
You will see the in result created_at field. I have to use query the created_at field. So I have to use it in select created_at. I don't want to use it created_at field in select. Because, there are millions of records in the deposits table. How can i escape this problem?
(Note: I have many table to query, like "deposits" table. this is just a short example.)
create table payment_methods
(
payment_method_id bigserial not null
constraint payment_methods_pkey
primary key
);
create table currencies_of_payment_methods
(
copm_id bigserial not null
constraint currencies_of_payment_methods_pkey
primary key,
payment_method_id integer not null
);
create table deposits
(
deposit_id bigserial not null
constraint deposits_pkey
primary key,
amount numeric(18,2) not null,
copm_id integer not null,
created_at timestamp(0)
);
INSERT INTO payment_methods (payment_method_id) VALUES (1);
INSERT INTO payment_methods (payment_method_id) VALUES (2);
INSERT INTO currencies_of_payment_methods (copm_id, payment_method_id) VALUES (1, 1);
INSERT INTO deposits (amount, copm_id, created_at) VALUES (100, 1, '2020-09-10 08:49:37');
INSERT INTO deposits (amount, copm_id, created_at) VALUES (200, 1, '2020-09-10 08:49:37');
INSERT INTO deposits (amount, copm_id, created_at) VALUES (40, 1, '2020-09-10 08:49:37');
Query:
SELECT payment_methods.payment_method_id,
deposit_copm_id.deposit_copm_id,
manuel_deposit_amount.manuel_deposit_amount,
manuel_deposit_amount.created_at
FROM payment_methods
CROSS JOIN lateral
(
SELECT currencies_of_payment_methods.copm_id AS deposit_copm_id
FROM currencies_of_payment_methods
WHERE currencies_of_payment_methods.payment_method_id = payment_methods.payment_method_id) deposit_copm_id
CROSS JOIN lateral
(
SELECT sum(deposits.amount) AS manuel_deposit_amount,
array_agg(deposits.created_at) AS created_at
FROM deposits
WHERE deposits.copm_id = deposit_copm_id.deposit_copm_id) manuel_deposit_amount
WHERE payment_methods.payment_method_id = 1

PostgreSQL count other values of ID that have the same value of other column

Let's say we have the following table that stores id of an observation and its address_id. You can create the table with the following code:
drop table if exists schema.pl_address_cnt;
create table schema.pl_address_cnt (
id serial,
address_id int);
insert into schema.pl_address_cnt(address_id) values
(100), (101), (100), (101), (100), (125), (128), (200), (200), (100);
My task is to count for each id how many other ids (thus -1) have the same address_id. I've come up with a solution that turns out to be quite expensive (explain) on the original dataset. I wonder whether my solution can be somehow optimised.
with tmp_table as (select address_id
, count(distinct id) as id_count
from schema.pl_address_cnt
group by address_id
)
select id
, id_count - 1
from schema.pl_address_cnt as pac
left join tmp_table as tt on tt.address_id=pac.address_id;
You can try to omit the CTE and do a self left join on common address but different ID and then aggregate this.
SELECT pac1.id,
count(pac2.id)
FROM pl_address_cnt pac1
LEFT JOIN pl_address_cnt pac2
ON pac1.address_id = pac2.address_id
AND pac1.id <> pac2.id
GROUP BY pac1.id
ORDER BY pac1.id;
For performance you can try indexes on (address_id, id) and (id).

Join two tables with count from first table

I know there is an obvious answer to this question, but I'm like a noob trying to remember how to write queries. I have the following table structure in Postgresql:
CREATE TABLE public.table1 (
accountid BIGINT NOT NULL,
rpt_start DATE NOT NULL,
rpt_end DATE NOT NULL,
CONSTRAINT table1_pkey PRIMARY KEY(accountid, rpt_start, rpt_end)
)
WITH (oids = false);
CREATE TABLE public.table2 (
customer_id BIGINT NOT NULL,
read VARCHAR(255),
CONSTRAINT table2 PRIMARY KEY(customer_id)
)
WITH (oids = false);
The objective of the query is to display a result set of accountid's, count of accountid's in table1 and read from table2. The join is on table1.accountid = table2.customer_id.
The result set should appear as follows:
accountid count read
1234 2 100
1235 9 110
1236 1 91
The count column reflect the number of rows in table1 for each accountid. The read column is a value from table2 associated with the same accountid.
select accountid, "count", read
from
(
select accountid, count(*) "count"
from table1
group by accountid
) t1
inner join
table2 t2 on t1.accountid = t2.customer_id
order by accountid
SELECT table2.customer_id, COUNT(*), table2.read
FROM table2
LEFT JOIN table1 ON (table2.customer_id = table1.accountid)
GROUP BY table2.customer_id, table2.read
SELECT t2.customer_id, t2.read, COUNT(*) AS the_count
FROM table2 t2
JOIN table1 t1 ON t1.accountid = t2.customer_id
GROUP BY t2.customer_id, t2.read
;