Postgresql : How to join with multiple cross-reference table? - postgresql

I've seen a lot of post on multiple JOIN but it didn't help me in my case.
Consider that I have three tables and two cross-reference tables. That's the difference with the others posts where they had multiple tables but one cross-reference table in the FROM.
Table1 -> cross-ref1 <- table2 -> cross-ref2 <- table3
My version of Postgresql is : 9.0.11, and I'm working with W7 64 bits.
My request is the following stuff :
Select [columns] from cross-ref1, cross-ref2
INNER JOIN table1 ON table1.id_table1=cross-ref1.ref_id_table1
INNER JOIN table2 ON table2.id=cross-ref1.ref_id_table2
INNER JOIN table2 On table2.id_table2=cross-ref2.ref_id_table2
INNER JOIN table3 ON table3.id_table3=cross-ref2.ref_id_table3
The error message is : "Table name is specified more than once."
Can you explain me the error ?
Thanks

Cross-reference tables need separate columns for each side of the reference. An xref table with just one column makes no sense, as it can only refer to rows with the same ID on each side.
A typical setup would be:
CREATE TABLE a (
id integer primary key,
avalue text not null
);
CREATE TABLE b (
id integer primary key,
bvalue text not null
);
CREATE TABLE ab (
a_id integer references a(id),
b_id integer references b(id),
PRIMARY KEY(a_id, b_id)
);
Given sample data:
INSERT INTO a(id, avalue) VALUES
(1, 'a1'), (2, 'a2'), (3, 'a3'), (4, 'a4');
INSERT INTO b(id, bvalue) VALUES
(41, 'b1'), (42, 'b2'), (43, 'b3');
INSERT INTO ab(a_id, b_id) VALUES
(1, 41), (1, 42), (2, 43);
You'd find the pairings of a and b with:
SELECT avalue, bvalue
FROM a
INNER JOIN ab ON (a.id = ab.a_id)
INNER JOIN b ON (b.id = ab.b_id);
The crucial thing here is that you're joining on ab.a_id on the a side, and ab.b_id on the b side. Observe demo here: http://sqlfiddle.com/#!12/3228a/1
This is pretty much "many-to-many table relationships 101", so it might be worth doing some more study of introductory SQL and relational database tutorials and documentation.

You can't use the same table name twice (table2). In this case you need to use aliases like t1, t2a, t2b, ...
SELECT
...
FROM
table1 AS t1
INNER JOIN table2 AS t2a
ON t2a.id= ...
INNER JOIN table2 AS t2b
ON t2b.id= ...
INNER JOIN table3 AS t3
ON t3.id= ...
...
Now you can join whatever you want, how many times you want etc.

You have to explain what result you want to have. For example the following SQL is valid from syntax point of view, not sure about business point of view:
-- this will create sample data with 5 tables
with
crossref1(ref_id) as (VALUES (1),(2),(3)),
crossref2 (ref_id) as (VALUES (2),(3),(4)),
table1 (ref_id) as (VALUES (3),(4),(5)),
table2 (ref_id) as (VALUES (1),(2),(3)),
table3 (ref_id) as (VALUES (1),(2),(3))
-- valid SQL based on your example
select * from
crossref1
cross join crossref2
join table1 on table1.ref_id=crossref1.ref_id
join table2 as t2_1 on t2_1.ref_id=crossref1.ref_id
join table2 as t2_2 on t2_2.ref_id=crossref2.ref_id
join table3 on table3.ref_id=crossref2.ref_id
With your SQL there are two problems:
You have two references to table2, you have to add alias
You have to use cross join syntax instead of ,
If you would like to understand how with works (how I created sample data), PostgreSQL has excellent documentation on this.

Related

JOIN with array of ids returns duplicate root records instead of just one

I'm trying to join several tables and pull out each DISTINCT root record (from table_a), but for some reason I keep getting duplicates. Here is my select query:
Fiddle
select
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
This returns the following:
[
{
"id": 2,
"tableName": "Root record 2"
},
{
"id": 2,
"tableName": "Root record 2"
}
]
But I am only expecting, in this case,
[
{
"id": 2,
"tableName": "Root record 2"
}
]
What am I doing wrong here?
Here's the fiddle and, just in case, the create and insert statements below:
create schema if not exists my_schema;
create table if not exists my_schema.table_a (
id serial primary key,
table_a_name varchar (255) not null
);
create table if not exists my_schema.table_b (
id serial primary key,
table_a_id bigint not null references my_schema.table_a (id)
);
create table if not exists my_schema.table_d (
id serial primary key
);
create table if not exists my_schema.table_c (
id serial primary key,
table_b_id bigint not null references my_schema.table_b (id),
table_d_ids bigint[] not null
);
insert into my_schema.table_a values
(1, 'Root record 1'),
(2, 'Root record 2'),
(3, 'Root record 3');
insert into my_schema.table_b values
(10, 2),
(11, 2),
(12, 3);
insert into my_schema.table_d values
(100),
(101),
(102),
(103),
(104);
insert into my_schema.table_c values
(1000, 10, array[]::int[]),
(1001, 10, array[100]),
(1002, 11, array[100, 101]),
(1003, 12, array[102]),
(1004, 12, array[103]);
Short answer is use distinct, and this will get the results you want:
select distinct
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
That said, this doesn't sit well with me because I assume this is not the end of your query.
The root issue is that you have two records from table_b - table_d that match this criteria. If you follow the breadcrumbs back, you will see there really are two matches:
select
ta.id,
ta.table_a_name as "tableName", tb.*, tc.*, td.*
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
So 'distinct' is just a lazy fix to say if there are dupes, limit it to one...
My next question is, is there more to it than this? What's supposed to happen next? Do you really just want candidates from table_a, or is this part 1 of a longer issue? If there is more to it, then there is likely a better solution than a simple select distinct.
-- edit 10/1/2022 --
Based on your comment, I have one final suggestion. Because this really all there is to your output AND you don't actually need the data from the b/c/d tables, then I think a semi-join is a better solution.
It's slightly more code (not going to win any golf or de-obfuscation contents), but it's much more efficient than a distinct or group by all columns. The reason is a distinct pulls every row result and then has to order and remove dupes. A semi-join, by contrast, will "stop looking" once it finds a match. It also scales very well. Almost every time I see a distinct misused, it's better served by a semi-join.
select
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
where exists (
select null
from
table_b tb,
table_c tc,
table_d tc
where
tb.table_a_id = ta.id and
tc.table_b_id = tb.id and
td.id = any(tc.table_d_ids) and
td.id = any(array[100])
)
I didn't suggest this initially because I was unclear on the "what next."

PostgreSQL joining using JSONB with TypeORM

I have this SQL
CREATE TABLE product(id SERIAL PRIMARY KEY, name text, categories JSONB);
INSERT INTO product(name, categories) VALUES
('prouct1', '{"ids":[4,5]}'),
('prouct2', '{"ids":[5,6]}'),
('prouct3', '{"ids":[7]}');
CREATE TABLE category(id bigint, rootid bigint);
INSERT INTO category(id, rootid) VALUES
(1, null),
(2, null),
(3, null),
(4, 1),
(5, 2),
(6, 1),
(7, 3);
I want to make this query with TypeORM. But I have no idea how to make jsonb_array_elements_text(b.categories->'ids') pc(categoryid) ON TRUE part with the TypeORM.
SELECT p.id, p.name, p.categories
FROM product p
INNER JOIN jsonb_array_elements_text(b.categories->'ids') pc(categoryid) ON TRUE
INNER JOIN category c ON pc.categoryid = c.categoryid AND c.rootid = 1000;
Alternatively, I was trying another query. But it is too slow when I put jsonb_array_elements_text(categories->'ids'). Why it happen?
SELECT p.id, p.name, p.categories
FROM product p
INNER JOIN (SELECT id, jsonb_array_elements_text(categories->'ids') categoryid FROM product) pc ON p.id = pc.id
INNER JOIN category c ON pc.categoryid = c.categoryid AND c.rootid = 1000;
In PostgreSQL have additional index type GIN for JSON and JSONB types. For best performance, you must create an index for this JSON field.
For example:
CREATE INDEX product_category_json_index ON product USING gin (categories jsonb_path_ops);
And I wrote an alternative query for you:
select main.*, cat.* from
(
select p.*, jsonb_array_elements((categories->'ids'))::integer as category_id
from product p
) main
inner join examples.category cat on cat.id = main.category_id;
I want to get more detailed information, for these tables, if you know then please explain to me how many records in both tables (product and category)
I want to insert to my local tables sample data (same count) for testing and analyzing

Merge in postgres

Am trying to convert below oracle query to postgres,
MERGE INTO table1 g
USING (SELECT distinct g.CDD , d.SGR
from table2 g, table3 d
where g.IDF = d.IDF) f
ON (g.SGR = f.SGR and g.CDD = f.CDD)
WHEN NOT MATCHED THEN
INSERT (SGR, CDD)
VALUES (f.SGR, f.CDD);
I made changes as below compatible to postgres:
WITH f AS (
SELECT distinct g.CDD , d.SGR
from table2 g, table3 d
where g.IDF = d.IDF
),
upd AS (
update table1 g
set
SGR = f.SGR , CDD = f.CDD
FROM f where g.SGR = f.SGR and g.CDD = f.CDD
returning g.CDD, g.SGR
)
INSERT INTO table1(SGR, CDD ) SELECT f.SGR, f.CDD FROM f;
But am doubtful ,my oracle query is not updating any columns if data matched , but am unable to convert it accordingly . Can anyone help me to correct it ?
Assuming you have a primary (or unique) key on (sgr, cdd) you can convert this to an insert ... on conflict statement:
insert into table1 (SGR, CDD)
select distinct g.CDD, d.SGR
from table2 g
join table3 d ON g.IDF = d.IDF
on conflict (cdd, sgr) do nothing;
If you don't have a unique constraint (which bears the question: why?) then a straight-forward INSERT ... SELECT statement should work (which would have worke in Oracle as well).
WITH f AS (
SELECT distinct g.CDD, d.SGR
from table2 g
join table3 d on g.IDF = d.IDF
)
INSERT INTO table1 (SGR, CDD)
SELECT f.SGR, f.CDD
FROM f
WHERE NOT EXISTS (select *
from table1 t1
join f on (t1.sgr, t1.cdd) = (f.cdd, f.sgrf));
Note that this is NOT safe for concurrent execution (and neither is Oracle's MERGE statement). You can still wind up with duplicate values in table1 (with regards to the combination of (sgr,cdd)).
The only sensible way to prevent duplicates is to create a unique index (or constraint) - which would enable you to use the much more efficient insert on conflict. You should really consider that if your business rules disallow duplicates.
Note that I converted your ancient, implicit join in the WHERE clause to a modern, explicit JOIN operator, but it is not required for this to work.

T-SQL how to join with one column a string and one an integer

How to join with one column a string and one an integer?
--PEOPLE_ID 000092437, PersonID 92437
select PC.PEOPLE_ID, Idn.PersonId,'Home Row 1', PC.Phone1 from #NextIdentityID Idn INNER JOIN PEOPLECHANGES PC on Idn.People_ID = PC.People_ID --PEOPLE_ID 000092437, PersonID 92437 one is varchar, one is integer
union all select PC.PEOPLE_ID, Idn.PersonId,'Office Row 2', PC.Phone2 from #NextIdentityID Idn INNER JOIN PEOPLECHANGES PC on Idn.People_ID = PC.People_ID
union all select PC.PEOPLE_ID, Idn.PersonId,'Cell Row 3', PC.Phone3 from #NextIdentityID Idn INNER JOIN PEOPLECHANGES PC on Idn.People_ID = PC.People_ID
To make sure your varchar() data doesn't raise any errors you should check to see if it can be converted into an integer. One way to do this is with a case statement in the where clause. If it is not convertible then your join won't work - but at least your query can still run with out error.
This example shows how you can avoid potential errors.
create table #tempa(id int, descr varchar(50));
create table #tempb(id varchar(10), descr varchar(50));
insert into #tempa(id,descr) values (1234,'Body getta body getta');
insert into #tempb(id,descr) values ('001234','sis boom ba - rah rah rah');
insert into #tempa(id,descr) values (5678,'Weagle Weagle War Damn Eagle');
insert into #tempb(id,descr) values ('0005678','Kickem in the butt Big blue');
insert into #tempa(id,descr) values (9012,'this wont have a match');
insert into #tempb(id,descr) values ('x0912','sis boom ba');
Select a.id as a_id, b.id as b_id
,a.descr as a_descr, b.descr as b_descr
from #tempa a
left join #tempb b
on a.id = case when isnumeric(b.id) = 1 then cast(b.id as int) else 0 end
-- this one will raise an error
Select a.id as a_id, b.id as b_id
,a.descr as a_descr, b.descr as b_descr
from #tempa a
left join #tempb b
on a.id = b.id
drop table #tempa;
drop table #tempb;
If you convert the one with leading zeros to an integer you will get equal values:
SELECT CONVERT(INT, '000092437') = 92437
However, this assumes that all of your varchar column can be convert to int.
If that's not the case then you have to write a function to go the other way and add leading zeros.

How to join vertical and horizontal table together table

I have two table with one of them is vertical i.e store only key value pair with ref id from table 1. i want to join both table and dispaly key value pair as a column in select. and also perform sorting on few keys.
T1 having (id,empid,dpt)
T2 having (empid,key,value)
select
T1.*,
t21.value,
t22.value,
t23.value,
t24.value
from Table1 t1
join Table2 t21 on t1.empid = t21.empid
join Table2 t22 on t1.empid = t22.empid
join Table2 t23 on t1.empid = t23.empid
where
t21.key = 'FNAME'
and t22.key = 'LNAME'
and t23.key='AGE'
The query you demonstrate is very inefficient (another join for each additional column) and also has a potential problem: if there isn't a row in T2 for every key in the WHERE clause, the whole row is excluded.
The second problem can be avoided with LEFT [OUTER] JOIN instead of [INNER] JOIN. But don't bother, the solution to the first problem is a completely different query. "Pivot" T2 using crosstab() from the additional module tablefunc:
SELECT * FROM crosstab(
'SELECT empid, key, value FROM t2 ORDER BY 1'
, $$VALUES ('FNAME'), ('LNAME'), ('AGE')$$ -- more?
) AS ct (empid int -- use *actual* data types
, fname text
, lname text
, age text);
-- more?
Then just join to T1:
select *
from t1
JOIN (<insert query from above>) AS t2 USING (empid);
This time you may want to use [INNER] JOIN.
The USING clause conveniently removes the second instance of the empid column.
Detailed instructions:
PostgreSQL Crosstab Query