Full outer join in postgres

Full outer join in postgres - postgresql

I have the following table in Postgresql:
sch_code sch_level start_date end_date flag
1234 P 01-01-2018 31-01-2018 V
1234 S 01-01-2018 31-01-2018 V
5678 S 01-01-2018 31-01-2018 V
8965 P 01-01-2018 31-01-2018 V
The result which I require is as follows.
sch_code start_P end_P start_S end_S
1234 01-01-2018 31-01-2018 01-01-2018 31-01-2018
5678 00-00-0000 00-00-0000 01-01-2018 31-01-2018
8965 01-01-2018 31-01-2018 00-00-0000 00-00-0000
The queries which i tried with UNION did not provide a result. I have also tried using looped select statements.

You need a FULL OUTER JOIN (see guide here).
And since you can't use a WHERE filter due to potential NULL values, you have to pre-filter the 2 data sets using CTEs (see guide here).
WITH pdata AS (
SELECT * FROM mytable WHERE sch_level='P'
),
sdata AS (
SELECT * FROM mytable WHERE sch_level='S'
)
SELECT
COALESCE(pdata.sch_code, sdata.sch_code) AS sch_code,
pdata.start_date AS start_P,
pdata.end_date AS end_P,
sdata.start_date AS start_S,
pdata.end_date AS end_S
FROM pdata
FULL OUTER JOIN sdata ON pdata.sch_code = sdata.sch_code
If you don't want NULL values in your date fields, simply use COALESCE and provide default values in whichever data type you might be using.

Related

Querying in postgres for a range around an integer

In postgresql, I'd like to determine whether a known integer is within a +/- range of another integer. What is the function for this query?
Example: In my dataset, I have 2 Tables:
Table_1
ID integer
1 2000
2 3000
3 4000
Table_2
ID integer
1 1995
2 3050
3 4100
For each ID-pair, I'd like to query whether Table_1.integer is +/- 25 of Table_2.integer.
The answers would be:
ID 1: TRUE
ID 2: FALSE
ID 3: FALSE
Any help is much appreciated. I am new to using postgresql and all programming languages in general.

We can try checking the absolute value of the difference between the two integer values, for each ID:
SELECT
t1.ID,
CASE WHEN ABS(t1.integer - t2.integer) <= 25 THEN 'TRUE' ELSE 'FALSE' END AS answer
FROM Table_1 t1
INNER JOIN Table_2 t2
ON t1.ID = t2.ID
ORDER BY
t1.ID;
Demo
If you want to just output the raw boolean value, then use:
SELECT
t1.ID,
ABS(t1.integer - t2.integer) <= 25 AS answer
FROM ...

This is almost similar to #Tim's solution but without the CASE expression, useful if you wish to output boolean types.
SELECT t1.ID,ABS(t1.integer - t2.integer) <= 25 as res
FROM table_1 t1 JOIN table_2 t2
ON t1.ID = t2.ID;
DEMO

Conditional JOIN with two different keys

I have a query that produces two separate IDs:
SELECT
date,
user_id,
vendor_id,
SUM(purchase) user_purchase
SUM(spend) vendor_spend
GROUP BY 1,2,3
FROM tabla.abc
This produces results like this:
date user_id vendor_id user_purchase vendor_spend
1/1/18 123 NULL 5.00 0.00
1/1/18 NULL 456 0.00 10.00
I want to join it on a table that looks like this:
client_id user_id vendor_id
456789 123 NULL
101112 NULL 456
But the problem is, I obviously want to join it on both the appropiate IDs so my final output can look like this:
date client_id user_id vendor_id user_purchase vendor_spend
1/1/18 456790 123 NULL 5.00 0.00
1/1/18 101112 NULL 456 0.00 10.00
So is there a way I can do like, a conditional join? Something like WHERE user_id IS NULL THEN... etc...

Use not distinct from because one of the argument may be null:
select *
from (
select
date,
user_id,
vendor_id,
sum(purchase) user_purchase,
sum(spend) vendor_spend
from table1
group by 1,2,3
) t1
join table2 t2
on (t1.user_id, t1.vendor_id)
is not distinct from (t2.user_id, t2.vendor_id)
Note that for performance reasons you should join already aggregated table (hence I have placed the original query in a derived table).

Try this:
SELECT
date,
COALESCE(lu.client_id, lv.client_id) AS client_id,
user_id,
vendor_id,
SUM(purchase) user_purchase
SUM(spend) vendor_spend
FROM tabla.abc
LEFT JOIN tabla.link AS lu USING (user_id)
LEFT JOIN tabla.link AS lv USING (vendor_id)
GROUP BY 1,2,3,4

I think the sufficient join is just this:
FROM aggregated_table t1
LEFT JOIN client_id_table t2
ON t1.user_id=t2.user_id
OR t1.vendor_id=t2.vendor_id
because as I understand you need to join by user id if there is user id and by vendor id if there is vendor id. Using a left join with OR does exactly that.
Also, conditional join is possible as well. If you're familiar with a CASE statement it works perfectly well in join conditions. Similar thing can be expressed as:
FROM aggregated_table t1
LEFT JOIN client_id_table t2
ON CASE
WHEN t1.user_id is not null THEN t1.user_id=t2.user_id
WHEN t1.vendor_id is not null THEN t1.vendor_id=t2.vendor_id
END
but this is too verbose compared to the previous option that I think should produce the same result

Correct sum with multiple subrecords (postgresql)

This have maybe been asked several times before, but I do not how to achieve a correct som from both parent and child.
Here is the tables:
CREATE TABLE co
(coid int4, coname text);
INSERT INTO co
(coid, coname)
VALUES
(1, 'Volvo'),
(2, 'Ford'),
(3, 'Jeep'),
(4, 'Toyota')
;
CREATE TABLE inv
(invid int4, invco int4, invsum numeric(10,2));
INSERT INTO inv
(invid, invco, invsum)
VALUES
(1,1,100),
(2,1,100),
(3,2,100),
(4,3,100),
(5,4,100)
;
CREATE TABLE po
(poid int4, poinv int4, posum int4);
INSERT INTO po
(poid, poinv, posum)
VALUES
(1,1,50),
(2,1,50),
(3,3,100),
(4,4,100)
;
I started with this simple query
SELECT coname, sum(invsum)
FROM inv
LEFT JOIN co ON coid=invco
GROUP BY 1
ORDER BY 1
Which gave a correct result:
coname sum
Ford 100
Jeep 100
Toyota 100
Volvo 200
Then I added the po record and the sums became incorrect:
SELECT coname, sum(posum) as po, sum(invsum)
FROM inv
LEFT JOIN co ON coid=invco
LEFT JOIN po ON poinv=invid
GROUP BY 1
ORDER BY 1
Which multiplied the sum for Volvo:
coname po sum
Ford 100 100
Jeep 100 100
Toyota (null) 100 (no records for po = correct)
Volvo 100 300 (wrong sum for inv)
How do I construct a query that gives correct result with multiple subrecords of po? (Window function?)
Sqlfiddle: http://sqlfiddle.com/#!15/0d90c/12

Do the aggregation before the joins. This is a little complicated in your case, because the relationship between co and po seems to require inv:
SELECT co.coname, p.posum, i.invsum
FROM co LEFT JOIN
(SELECT i.invco, sum(i.invsum) as invsum
FROM inv i
GROUP BY i.invco
) i
ON co.coid = i.invco LEFT JOIN
(SELECT i.invco, sum(po.posum) as posum
FROM po JOIN
inv i
ON po.poinv = i.invid
GROUP BY i.invco
) p
ON co.coid = p.invco
ORDER BY 1;
Note: I presume the logic is to keep everything in the co table, even if there are no matches in the other tables. The LEFT JOIN should start with this table, the one with all the rows you want to keep.

Update using left join in netezza

I need to perform a left join of two tables in netezza during an update. How can i achieve this ? Left join with three tables are working but not with two tables.
UPDATE table_1
SET c2 = t2.c2
FROM
table_1 t1
LEFT JOIN table_2.t1
ON t1.c1=t2.c1
LEFT JOIN table_3 t3
ON t2.c1=t3.c1
this works but
UPDATE table_1
SET c2 = t2.c2
FROM table_1 t1
LEFT JOIN table_2.t1
ON t1.c1=t2.c1
this says like trying to update multiple columns.
Thanks,
Manirathinam.

When performing an UPDATE TABLE with a join in Netezza, it's important to understand that the table being updated is always implicitly INNER JOINed with the FROM list. This behavior is documented here.
Your code is actually joining table_1 to itself (one copy with no alias, and one with t1 as an alias). Since there is no join criteria between those two versions of table_1, you are getting a cross join which is providing multiple rows that are trying to update table_1.
The best way to tackle an UPDATE with an OUTER join is to employ a subselect like this:
TESTDB.ADMIN(ADMIN)=> select * from table_1 order by c1;
C1 | C2
----+----
1 | 1
2 | 2
3 | 3
(3 rows)
TESTDB.ADMIN(ADMIN)=> select * from table_2 order by c1;
C1 | C2
----+----
1 | 10
3 | 30
(2 rows)
TESTDB.ADMIN(ADMIN)=> UPDATE table_1 t1
SET t1.c2 = foo.c2
FROM (
SELECT t1a.c1,
t2.c2
FROM table_1 t1a
LEFT JOIN table_2 t2
ON t1a.c1 = t2.c1
)
foo
WHERE t1.c1 = foo.c1;
UPDATE 3
TESTDB.ADMIN(ADMIN)=> select * from table_1 order by c1;
C1 | C2
----+----
1 | 10
2 |
3 | 30
(3 rows)

Select top three values in each group

following is my sample table and rows
create table com (company text,val int);
insert into com values ('com1',1),('com1',2),('com1',3),('com1',4),('com1',5);
insert into com values ('com2',11),('com2',22),('com2',33),('com2',44),('com2',55);
insert into com values ('com3',111),('com3',222),('com3',333),('com3',444),('com3',555);
I want to get the top 3 value of each company, expected output is :
company val
---------------
com1 5
com1 4
com1 3
com2 55
com2 44
com2 33
com3 555
com3 444
com3 333

Try This:
SELECT company, val FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY
company order by val DESC) AS Row_ID FROM com
) AS A
WHERE Row_ID < 4 ORDER BY company
--Quick Demo Here...

Since v9.3 you can do a lateral join
select distinct com_outer.company, com_top.val from com com_outer
join lateral (
select * from com com_inner
where com_inner.company = com_outer.company
order by com_inner.val desc
limit 3
) com_top on true
order by com_outer.company;
It might be faster but, of course, you should test performance specifically on your data and use case.

You can try arrays, which are available since Postgres v9.0.
WITH com_ordered AS (SELECT * FROM com ORDER BY company,val DESC)
SELECT company,unnest((array_agg(val))[0:3])
FROM com_ordered GROUP BY company;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Full outer join in postgres - postgresql

Related

Querying in postgres for a range around an integer

Conditional JOIN with two different keys

Correct sum with multiple subrecords (postgresql)

Update using left join in netezza

Select top three values in each group

Categories

Resources