PostgreSQL joining using JSONB with TypeORM - postgresql

I have this SQL
CREATE TABLE product(id SERIAL PRIMARY KEY, name text, categories JSONB);
INSERT INTO product(name, categories) VALUES
('prouct1', '{"ids":[4,5]}'),
('prouct2', '{"ids":[5,6]}'),
('prouct3', '{"ids":[7]}');
CREATE TABLE category(id bigint, rootid bigint);
INSERT INTO category(id, rootid) VALUES
(1, null),
(2, null),
(3, null),
(4, 1),
(5, 2),
(6, 1),
(7, 3);
I want to make this query with TypeORM. But I have no idea how to make jsonb_array_elements_text(b.categories->'ids') pc(categoryid) ON TRUE part with the TypeORM.
SELECT p.id, p.name, p.categories
FROM product p
INNER JOIN jsonb_array_elements_text(b.categories->'ids') pc(categoryid) ON TRUE
INNER JOIN category c ON pc.categoryid = c.categoryid AND c.rootid = 1000;
Alternatively, I was trying another query. But it is too slow when I put jsonb_array_elements_text(categories->'ids'). Why it happen?
SELECT p.id, p.name, p.categories
FROM product p
INNER JOIN (SELECT id, jsonb_array_elements_text(categories->'ids') categoryid FROM product) pc ON p.id = pc.id
INNER JOIN category c ON pc.categoryid = c.categoryid AND c.rootid = 1000;

In PostgreSQL have additional index type GIN for JSON and JSONB types. For best performance, you must create an index for this JSON field.
For example:
CREATE INDEX product_category_json_index ON product USING gin (categories jsonb_path_ops);
And I wrote an alternative query for you:
select main.*, cat.* from
(
select p.*, jsonb_array_elements((categories->'ids'))::integer as category_id
from product p
) main
inner join examples.category cat on cat.id = main.category_id;
I want to get more detailed information, for these tables, if you know then please explain to me how many records in both tables (product and category)
I want to insert to my local tables sample data (same count) for testing and analyzing

Related

JOIN with array of ids returns duplicate root records instead of just one

I'm trying to join several tables and pull out each DISTINCT root record (from table_a), but for some reason I keep getting duplicates. Here is my select query:
Fiddle
select
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
This returns the following:
[
{
"id": 2,
"tableName": "Root record 2"
},
{
"id": 2,
"tableName": "Root record 2"
}
]
But I am only expecting, in this case,
[
{
"id": 2,
"tableName": "Root record 2"
}
]
What am I doing wrong here?
Here's the fiddle and, just in case, the create and insert statements below:
create schema if not exists my_schema;
create table if not exists my_schema.table_a (
id serial primary key,
table_a_name varchar (255) not null
);
create table if not exists my_schema.table_b (
id serial primary key,
table_a_id bigint not null references my_schema.table_a (id)
);
create table if not exists my_schema.table_d (
id serial primary key
);
create table if not exists my_schema.table_c (
id serial primary key,
table_b_id bigint not null references my_schema.table_b (id),
table_d_ids bigint[] not null
);
insert into my_schema.table_a values
(1, 'Root record 1'),
(2, 'Root record 2'),
(3, 'Root record 3');
insert into my_schema.table_b values
(10, 2),
(11, 2),
(12, 3);
insert into my_schema.table_d values
(100),
(101),
(102),
(103),
(104);
insert into my_schema.table_c values
(1000, 10, array[]::int[]),
(1001, 10, array[100]),
(1002, 11, array[100, 101]),
(1003, 12, array[102]),
(1004, 12, array[103]);
Short answer is use distinct, and this will get the results you want:
select distinct
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
That said, this doesn't sit well with me because I assume this is not the end of your query.
The root issue is that you have two records from table_b - table_d that match this criteria. If you follow the breadcrumbs back, you will see there really are two matches:
select
ta.id,
ta.table_a_name as "tableName", tb.*, tc.*, td.*
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
So 'distinct' is just a lazy fix to say if there are dupes, limit it to one...
My next question is, is there more to it than this? What's supposed to happen next? Do you really just want candidates from table_a, or is this part 1 of a longer issue? If there is more to it, then there is likely a better solution than a simple select distinct.
-- edit 10/1/2022 --
Based on your comment, I have one final suggestion. Because this really all there is to your output AND you don't actually need the data from the b/c/d tables, then I think a semi-join is a better solution.
It's slightly more code (not going to win any golf or de-obfuscation contents), but it's much more efficient than a distinct or group by all columns. The reason is a distinct pulls every row result and then has to order and remove dupes. A semi-join, by contrast, will "stop looking" once it finds a match. It also scales very well. Almost every time I see a distinct misused, it's better served by a semi-join.
select
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
where exists (
select null
from
table_b tb,
table_c tc,
table_d tc
where
tb.table_a_id = ta.id and
tc.table_b_id = tb.id and
td.id = any(tc.table_d_ids) and
td.id = any(array[100])
)
I didn't suggest this initially because I was unclear on the "what next."

How to JOIN tables without extra duplicates in multiple one-to-many relationship

I have two child tables with a one-to-many relationship to the parent table. I want to join them without extra duplication.
In the actual schema, there are more one-to-many relationships to this parent and to child tables. I'm sharing a simplified schema to make the root of the problem to be easy to be seen.
Any suggestion is highly appreciated.
CREATE TABLE computer (
id SERIAL PRIMARY KEY,
name TEXT
);
CREATE TABLE c_user (
id SERIAL PRIMARY KEY,
computer_id INT REFERENCES computer,
name TEXT
);
CREATE TABLE c_accessories (
id SERIAL PRIMARY KEY,
computer_id INT REFERENCES computer,
name TEXT
);
INSERT INTO computer (name) VALUES ('HP'), ('Toshiba'), ('Dell');
INSERT INTO c_user (computer_id, name) VALUES (1, 'John'), (1, 'Elton'), (1, 'David'), (2, 'Ali');
INSERT INTO c_accessories (computer_id, name) VALUES (1, 'mouse'), (1, 'keyboard'), (1, 'mouse'), (2, 'mouse'), (2, 'printer'), (2, 'monitor'), (3, 'speaker');
This is my query:
SELECT
c.id
,c.name
,jsonb_agg(c_user.name)
,jsonb_agg(c_accessories.name)
FROM
computer c
JOIN
c_user ON c_user.computer_id = c.id
JOIN
c_accessories ON c_accessories.computer_id = c.id
GROUP BY c.id
I'm getting this result:
1 "HP" ["John", "John", "John", "Elton", "Elton", "Elton", "David", "David", "David"] ["mouse", "keyboard", "mouse", "mouse", "keyboard", "mouse", "mouse", "keyboard", "mouse"]
2 "Toshiba" ["Ali", "Ali", ""Ali"] ["monitor", "printer", "mouse"]
I want to get this result (by preserving duplicates if exist in database). And also be able to filter computers by user and/or accessory:
1 "HP" ["John", "Elton", "David"] ["keyboard", "mouse", "mouse"]
2 "Toshiba" ["Ali"] ["monitor", "printer", "mouse"]
3 "Dell" Null ["speaker"]
Use subqueries instead of joins:
SELECT
c.id,
c.name,
(SELECT
jsonb_agg(c_user.name)
FROM c_user
WHERE c_user.computer_id = c.id
) AS user_names,
(SELECT
jsonb_agg(c_accessories.name)
FROM c_accessories
WHERE c_accessories.computer_id = c.id
) AS accessory_names
FROM
computer c
Join the users to a derived table that does the joining and aggregation of computers and accessories and aggregate again.
SELECT ca.id,
jsonb_agg(u.name) AS users,
ca.accessories
FROM (SELECT c.id,
jsonb_agg(a.name) AS accessories
FROM computer AS c
LEFT JOIN c_accessories AS a
ON a.computer_id = c.id
GROUP BY c.id) AS ca
INNER JOIN c_user AS u
ON u.computer_id = ca.id
GROUP BY ca.id,
ca.accessories;
You could also first aggregate including the IDs of users and accessories, so that you can use DISTINCT in the aggregation function, for example into arrays of records. Reaggrete to JSON in subqueries.
SELECT c.id,
(SELECT jsonb_agg(x.name)
FROM unnest(array_agg(DISTINCT row(u.id, u.name))) AS x
(id integer,
name text)) AS users,
(SELECT jsonb_agg(x.name)
FROM unnest(array_agg(DISTINCT row(a.id, a.name))) AS x
(id integer,
name text)) AS accessories
FROM computer AS c
LEFT JOIN c_accessories AS a
ON a.computer_id = c.id
INNER JOIN c_user AS u
ON u.computer_id = c.id
GROUP BY c.id;
db<>fiddle
Aggregate first, then join to the computer table to the result of the aggregation.
select c.id, c.name,
cu.users,
ca.accessories
from computer c
left join (
select computer_id, jsonb_agg(name) as users
from c_user
group by computer_id
) as cu on cu.computer_id = c.id
left join (
select computer_id, jsonb_agg(name) as accessories
from c_accessories
group by computer_id
) as ca on ca.computer_id = c.id

Joining two one-to-many tables duplicates records

I have 3 tables, Transaction, Transaction_Items and Transaction_History.
Where the Transaction is the parent table, while Transaction_Items and Transaction_History are the children tables, with one to many relationship.
When i try to join those tables together, if i have 2+ Transaction_History records, or 2+ Transaction_Items i get duplicated or triplicated record results.
This is the SQL query im currently using which works, but what worries me that in the future if i have to Join another one-to-many table, it will duplicate the results again.
I found a workaround for this, but i was just wondering if there is a better and cleaner way to do this ?
The results should be a PostgreSQL JSON array which will contain the Transaction_Items and Transaction_History
SELECT
TR.id AS transaction_id,
TR.transaction_number,
TR.status,
TR.status AS status,
to_json(TR_INV.list),
COUNT(TR_INV) item_cnt,
COUNT(THR) tr_cnt,
json_agg(THR)
FROM transaction_transaction AS TR
LEFT JOIN (
SELECT
array_agg(t) list, -- this is a workaround method
t.transaction_id
FROM (
SELECT
TR_INV.transaction_id transaction_id,
IT.id,
IT.stock_number,
CAT.key category_key,
ITP.description description,
ITP.serial_number serial_number,
ITP.color color,
ITP.manufacturer manufacturer,
ITP.inventory_model inventory_model,
ITP.average_cost average_cost,
ITP.location_in_store location_in_store,
ITP.firearm_caliber firearm_caliber,
ITP.federal_firearm_number federal_firearm_number,
ITP.sold_price sold_price
FROM transaction_transaction_item TR_INV
LEFT JOIN inventory_item IT ON IT.id = TR_INV.item_id
LEFT JOIN inventory_itemprofile ITP ON ITP.id = IT.current_profile_id
LEFT JOIN inventory_category CAT ON CAT.id = ITP.category_id
LEFT JOIN inventory_categorytype CAT_T ON CAT_T.id = CAT.category_type_id
) t
GROUP BY t.transaction_id
) TR_INV ON TR_INV.transaction_id = TR.id
LEFT JOIN transaction_transactionhistory THR ON THR.transaction_id = TR.id
AND (THR.audit_code_id = 44 OR THR.audit_code_id = 27 OR THR.audit_code_id = 28)
WHERE TR.store_id = 21
AND TR.transaction_type = 'Pawn_Loan' AND TR.date_made >= '2018-10-08'
GROUP BY TR.id, TR_INV.list
What you want to do can be achieved by not using joins, as shown below.
Because your actual tables have so many columns that I don't know and should not care. I just created the simplest forms of them for demonstration.
CREATE TABLE transactions (
tid serial PRIMARY KEY,
name varchar(40) NOT NULL
);
CREATE TABLE transaction_histories (
hid serial PRIMARY KEY ,
tid integer REFERENCES transactions(tid),
history varchar(40) NOT NULL
);
CREATE TABLE transaction_items (
iid serial PRIMARY KEY ,
tid integer REFERENCES transactions(tid),
item varchar(40) NOT NULL
);
INSERT INTO transactions(tid,name) Values(1, 'transaction');
INSERT INTO transaction_histories(tid, history) Values(1, 'history1');
INSERT INTO transaction_histories(tid, history) Values(1, 'history2');
INSERT INTO transaction_items(tid, item) Values(1, 'item1');
INSERT INTO transaction_items(tid, item) Values(1, 'item2');
select
t.*,
(select count(*) from transaction_histories h where h.tid= t.tid) h_count ,
(select json_agg(h) from transaction_histories h where h.tid= t.tid) h ,
(select count(*) from transaction_items i where i.tid= t.tid) i_count ,
(select json_agg(i) from transaction_items i where i.tid= t.tid) i
from transactions t;

Postgresql : How to join with multiple cross-reference table?

I've seen a lot of post on multiple JOIN but it didn't help me in my case.
Consider that I have three tables and two cross-reference tables. That's the difference with the others posts where they had multiple tables but one cross-reference table in the FROM.
Table1 -> cross-ref1 <- table2 -> cross-ref2 <- table3
My version of Postgresql is : 9.0.11, and I'm working with W7 64 bits.
My request is the following stuff :
Select [columns] from cross-ref1, cross-ref2
INNER JOIN table1 ON table1.id_table1=cross-ref1.ref_id_table1
INNER JOIN table2 ON table2.id=cross-ref1.ref_id_table2
INNER JOIN table2 On table2.id_table2=cross-ref2.ref_id_table2
INNER JOIN table3 ON table3.id_table3=cross-ref2.ref_id_table3
The error message is : "Table name is specified more than once."
Can you explain me the error ?
Thanks
Cross-reference tables need separate columns for each side of the reference. An xref table with just one column makes no sense, as it can only refer to rows with the same ID on each side.
A typical setup would be:
CREATE TABLE a (
id integer primary key,
avalue text not null
);
CREATE TABLE b (
id integer primary key,
bvalue text not null
);
CREATE TABLE ab (
a_id integer references a(id),
b_id integer references b(id),
PRIMARY KEY(a_id, b_id)
);
Given sample data:
INSERT INTO a(id, avalue) VALUES
(1, 'a1'), (2, 'a2'), (3, 'a3'), (4, 'a4');
INSERT INTO b(id, bvalue) VALUES
(41, 'b1'), (42, 'b2'), (43, 'b3');
INSERT INTO ab(a_id, b_id) VALUES
(1, 41), (1, 42), (2, 43);
You'd find the pairings of a and b with:
SELECT avalue, bvalue
FROM a
INNER JOIN ab ON (a.id = ab.a_id)
INNER JOIN b ON (b.id = ab.b_id);
The crucial thing here is that you're joining on ab.a_id on the a side, and ab.b_id on the b side. Observe demo here: http://sqlfiddle.com/#!12/3228a/1
This is pretty much "many-to-many table relationships 101", so it might be worth doing some more study of introductory SQL and relational database tutorials and documentation.
You can't use the same table name twice (table2). In this case you need to use aliases like t1, t2a, t2b, ...
SELECT
...
FROM
table1 AS t1
INNER JOIN table2 AS t2a
ON t2a.id= ...
INNER JOIN table2 AS t2b
ON t2b.id= ...
INNER JOIN table3 AS t3
ON t3.id= ...
...
Now you can join whatever you want, how many times you want etc.
You have to explain what result you want to have. For example the following SQL is valid from syntax point of view, not sure about business point of view:
-- this will create sample data with 5 tables
with
crossref1(ref_id) as (VALUES (1),(2),(3)),
crossref2 (ref_id) as (VALUES (2),(3),(4)),
table1 (ref_id) as (VALUES (3),(4),(5)),
table2 (ref_id) as (VALUES (1),(2),(3)),
table3 (ref_id) as (VALUES (1),(2),(3))
-- valid SQL based on your example
select * from
crossref1
cross join crossref2
join table1 on table1.ref_id=crossref1.ref_id
join table2 as t2_1 on t2_1.ref_id=crossref1.ref_id
join table2 as t2_2 on t2_2.ref_id=crossref2.ref_id
join table3 on table3.ref_id=crossref2.ref_id
With your SQL there are two problems:
You have two references to table2, you have to add alias
You have to use cross join syntax instead of ,
If you would like to understand how with works (how I created sample data), PostgreSQL has excellent documentation on this.

Concatenated columns should not match in 2 tables

I'll just put this in layman's terms since I'm a complete noobie:
I have 2 tables A and B, both having 2 columns of interest namely: employee_number and salary.
What I am looking to do is to extract rows of 'combination' of employee_number and salary from A that are NOT present in B, but each of employee_number and salary should be present in both.
I am looking to doing it with the 2 following conditions(please forgive the wrong function
names.. this is just to present the problem 'eloquently'):
1.) A.unique(employee_number) exists in B.unique(employee_number) AND A.unique(salary)
exists in B.unique(salary)
2.) A.concat(employee_number,salary) <> B.concat(employee_number,salary)
Note: A and B are in different databases, so I'm looking to use dblink to do this.
This is what I tried doing:
SELECT distinct * FROM dblink('dbname=test1 port=5432
host=test01 user=user password=password','SELECT employee_number,salary, employee_number||salary AS ENS FROM empsal.A')
AS A(employee_number int8, salary integer, ENS numeric)
LEFT JOIN empsalfull.B B on B.employee_number = A.employee_number AND B.salary = A.salary
WHERE A.ENS not in (select distinct employee_number || salary from empsalfull.B)
but it turned out to be wrong as I had it cross-checked by using spreadsheets and I don't get the same result.
Any help would be greatly appreciated. Thanks.
For easier understanding I left out the dblink.
Because, the first one selects lines in B that equal the employeenumber in A as well as the salery in A, so their concatenated values will equal as well (if you expect this to not be true, please provide some test data).
SELECT * from firsttable A
LEFT JOIN secondtable B where
(A.employee_number = B.employee_number AND a.salery != b.salery) OR
(A.salery = B.salery AND A.employee_number != B.employee_number)
If you have troubles with lines containing nulls, you might also try somthing like this:
AND (a.salery != b.salery OR (a.salery IS NULL AND b.salery IS NOT NULL) or (a.salery IS NOT
NULL and b.salery IS NULL))
I think you're looking for something along these lines.
(Sample data)
create table A (
employee_number integer primary key,
salary integer not null
);
create table B (
employee_number integer primary key,
salary integer not null
);
insert into A values
(1, 20000),
(2, 30000),
(3, 20000); -- This row isn't in B
insert into B values
(1, 20000), -- Combination in A
(2, 20000), -- Individual values in A
(3, 50000); -- Only emp number in A
select A.employee_number, A.salary
from A
where (A.employee_number, A.salary) NOT IN (select employee_number, salary from B)
and A.employee_number IN (select employee_number from B)
and A.salary IN (select salary from B)
output: 3, 20000