How to get flat aggregation of two calls to json_agg - postgresql

I have the following tables:
products
id | name
----+-------------
1 | Shampoo
2 | Conditioner
productOptions
id | name | productId
----+-------------+-----------
1 | Hair Growth | 1
2 | Frizzy Hair | 1
images
id | fileName | productOptionId
----+-----------+-----------------
1 | bee.png | 1
2 | fancy.png | 2
3 | soap.png | 2
products have many productOptions, and productOptions have many images.
Following from this question, I have aggregated images.fileName twice to get an aggregated list of the fileNames for each product:
SELECT p.name, o.options, o.images
FROM products p
LEFT JOIN (
SELECT "productId", array_agg(name) AS options, json_agg(i.images) AS images
FROM "productOptions" o
LEFT JOIN (
SELECT "productOptionId", json_agg(i."fileName") AS images
FROM images i
GROUP BY 1
) i ON i."productOptionId" = o.id
GROUP BY 1
) o ON o."productId" = p.id;
name | options | images
-------------+-------------------------------+------------------------------------------
Shampoo | {"Hair Growth","Frizzy Hair"} | [["bee.png"], ["fancy.png", "soap.png"]]
Conditioner | |
I am wondering how to flatten the second json_agg so that the list of images is flat, and if my overall approach makes sense.

I didn't have to json_agg inside the inner-most JOIN, instead I can call array_agg(i.images) at the same point array_agg(name) AS options is called, to get a flat list of images:
SELECT p.name, o.options, o.images
FROM products p
LEFT JOIN (
SELECT "productId", array_agg(DISTINCT name) AS options, array_agg(i.images) AS images
FROM options o
LEFT JOIN (
SELECT "optionId", i."fileName" AS images
FROM images i
) i ON i."optionId" = o.id
GROUP BY 1
) o ON o."productId" = p.id;
name | options | images
-------------+-------------------------------+------------------------------
Shampoo | {"Frizzy Hair","Hair Growth"} | {bee.png,fancy.png,soap.png}
Conditioner |

A different approach:
I used DISTINCT in function json_agg() (you can use array_agg() instead), so as not to repeat the name of the product Options.
SELECT
p.name,
json_agg(DISTINCT po.name) AS options,
json_agg(i."fileName") AS images
FROM products p
LEFT JOIN "productOptions" po ON p.id = po."productId"
LEFT JOIN images AS i ON po.id = i."productOptionId"
GROUP BY p.name;
Or using subqueries:
SELECT
p.name,
po.options,
poi.images
FROM products p
LEFT JOIN (SELECT productId, json_agg(name) AS options
FROM productOptions
GROUP BY productId) AS po ON p.id = po.productId
LEFT JOIN (SELECT
productId,
json_agg(fileName) AS images
FROM productOptions po
INNER JOIN images i ON i.productOptionId = po.id
GROUP BY productId) AS poi ON p.id = poi.productId;

Related

Transforming information in postgresql

So, I have 2 tables,
In the 1st table, there is an Information of users
user_id | name
1 | Albert
2 | Anthony
and in the other table, I have information
where some users have address information where it can either be home, office or both home and office
user_id| address_type | address
1 | home | a
1 | office | b
2 | home | c
and the final result I want is this
user_id | name | home_address | office_address
1 | Albert | a | b
2 | Anthony | c | null
I have tried using left join and json_agg but the information that way is not readable,
any suggestions on how I can do this?
You can use two outer joins, one for the office address and one for the home address.
select t1.user_id, t1.name,
ha.address as home_address,
oa.address as office_address
from table1 t1
left join table2 ha on ha.user_id = t1.user_id and ha.address_type = 'home'
left join table2 oa on oa.user_id = t1.user_id and ha.address_type = 'office';
A solution using JSON could look like this
select t1.user_id, t1.name,
a.addresses ->> 'home' as home_address,
a.addresses ->> 'office' as office_address
from table1 t1
left join (
select user_id, jsonb_object_agg(address_type, address) as addresses
from table2
group by user_id
) a on a.user_id = t1.user_id;
Which might be a bit more flexible, because you don't need to add a new join for each address type. The first query is likely to be faster if you need to retrieve a large number of rows.

Postgresql select based on another select with order, limit and agregate

I have a tables like:
books:
id | title | rating_avr | visible
1 | 'Overlord' | 5 | true
2 | 'Avengers' | 10 | false
tags_books:
tag_id | book_id | likes
1 | 1 | 5
2 | 1 | 25
1 | 2 | 11
tags:
id | name
1 | 'Adventure'
2 | 'Drama'
Now i need to load books that have tag 'Drama' with LIMIT, ORDER and agregate tags for each book.
I managed to achive this using query:
SELECT b.id, b.title, b.rating_avr, json_agg(json_build_object('id', tb2.tag_id)) as tags
FROM books b
LEFT JOIN tags_books tb ON tb.book_id = b.id
LEFT JOIN tags_books tb2 ON tb2.book_id = b.id
WHERE tb.tag_id = 1 AND b.visible=true
GROUP BY b.id ORDER BY b.rating_avr DESC LIMIT 5
What i'm curious about:
1) Is it ok to join same table 2 times? First is for where clause and second to agregate tags.
2) How can i order agregated tags based on likes?
3) Is it a right approach, or maybe there is better way to do it?
It is strange that in your query, you don't use the table tags, although you want to fetch books with tag 'Drama' which is a column in the table tags.
What I would do is first get the ids of all the books with tag 'Drama' with a query like this:
SELECT b.id FROM books b
INNER JOIN tags_books tb ON tb.book_id = b.id
INNER JOIN tags t ON t.id = tb.tag_id
WHERE t.name = 'Drama' AND b.visible=true
and then use it to get the result:
SELECT
b.id, b.title, b.rating_avr,
json_agg(json_build_object('id', tb.tag_id) order by tb.likes desc) as tags
FROM books b INNER JOIN tags_books tb
ON tb.book_id = b.id
WHERE b.id IN (
SELECT b.id FROM books b
INNER JOIN tags_books tb ON tb.book_id = b.id
INNER JOIN tags t ON t.id = tb.tag_id
WHERE t.name = 'Drama' AND b.visible=true
)
GROUP BY b.id, b.title, b.rating_avr
ORDER BY b.rating_avr DESC LIMIT 5
See the demo.
Results:
> id | title | rating_avr | tags
> -: | :------- | ---------: | :-----------------------
> 1 | Overlord | 5 | [{"id" : 2}, {"id" : 1}]

Why does my LEFT JOIN work when I perform "SELECT * ", but fails when I select only the necessary columns?

I'm new to SQL but I'm trying to join two tables. However, it's not working as I expected. This is in Postgresql.
Here are the tables I'm trying to join.
My Tables
SELECT * FROM houses;
id | name | address | picture
----+----------------+-------------+------------
1 | House 1 | 440 S 3rd W | long-link2.jpg
2 | House 2 | 538 S 5th E | long-link.jpg
SELECT house_id, trunc(avg(score), 1) FROM house_reviews GROUP BY house_id;
house_id | trunc
----------+-------
1 | 3.0
2 | 3.0
My JOIN statements
Attempt 1 (works)
SELECT * FROM houses
LEFT JOIN (SELECT house_id, trunc(avg(score), 1) FROM house_reviews GROUP BY house_id) AS r
ON houses.id = r.house_id;
Attempt 2 (does not work)
SELECT id, name, address FROM houses
LEFT JOIN (SELECT house_id, trunc(avg(score), 1) FROM house_reviews GROUP BY house_id) AS r
ON houses.id = r.house_id;
The only difference between the two is that I don't select the picture in the attempt 2. But attempt 2 doesn't seem to join at all. Instead it displays
id | name | address
----+----------------+-------------
1 | Tuscany | 440 S 2nd W
2 | Mountain Lofts | 538 S 2nd W
meaning that it failed to join and is instead just displaying the houses table.
My Question
I'm confused why the join failed in the second table because I removed only one arbitrary column (pictures).
Is there a way that I can join the two tables together but also exclude the pictures column from the "houses" table?
Thank you!
You're only seeing data from houses because that's all you've selected. Try this:
SELECT
h.id, h.name, h.address,
r.avg_score
FROM houses h
LEFT JOIN (
SELECT house_id, trunc(avg(score), 1) avg_score
FROM house_reviews
GROUP BY house_id
) AS r
ON houses.id = r.house_id;

Unpack expression results from case statement

Four categories in category table.
id | name
--------------
1 | 'wine'
2 | 'chocolate'
3 | 'autos'
4 | 'real estate'
Two of the many (thousands of) forecasters in forecaster table.
id | name
--------------
1 | 'sothebys'
2 | 'cramer'
Relevant forecasts by the forecasters for the categories in the forecast table.
| id | forecaster_id | category_id | forecast |
|----+---------------+-------------+--------------------------------------------------------------|
| 1 | 1 | 1 | 'bad weather, prices rise short-term' |
| 2 | 1 | 2 | 'cocoa bean surplus, prices drop' |
| 3 | 1 | 3 | 'we dont deal with autos - no idea' |
| 4 | 2 | 2 | 'sell, sell, sell' |
| 5 | 2 | 3 | 'demand for cocoa will skyrocket - prices up - buy, buy buy' |
I want prioritized mapping of (forecaster, category, forecast) such that, if a forecast exists for some primary forecaster (e.g. 'cramer') use it because I trust him more. If a forecast exists for some secondary forecaster (e.g. 'sothebys') use that. If no forecast exists for a category, return a row with that category and null for forecast.
I have something that almost works and after I get the logic down I hope to turn into parameterized query.
select
case when F1.category is not null
then (F1.forecaster, F1.category, F1.forecast)
when F2.category is not null
then (F2.forecaster, F2.category, F2.forecast)
else (null, C.category, null)
end
from
(
select
FR.name as forecaster,
C.id as cid,
C.category as category,
F.forecast
from
forecast F
inner join forecaster FR on (F.forecaster_id = FR.id)
inner join category C on (C.id = F.category_id)
where FR.name = 'cramer'
) F1
right join (
select
FR.name as forecaster,
C.id as cid,
C.category as category,
F.forecast
from
forecast F
inner join forecaster FR on (F.forecaster_id = FR.id)
inner join category C on (C.id = F.category_id)
where FR.name = 'sothebys'
) F2 on (F1.cid = F2.cid)
full outer join category C on (C.id = F2.cid);
This gives:
'(sothebys,wine,"bad weather, prices rise short-term")'
'(cramer,chocolate,"sell, sell, sell")'
'(cramer,autos,"demand for cocoa will skyrocket - prices up - buy, buy buy")'
'(,"real estate",)'
While that is the desired data it is a record of one column instead of three. The case was the only way I could find to achieve the ordering of cramer first sothebys next and there is lots of duplication. Is there a better way and how can the tuple like results be pulled back apart into columns?
Any suggestions, especially related to removal of duplication or general simplification appreciated.
This sounds like a case for DISTINCT ON (untested):
SELECT DISTINCT ON (c.id)
fr.name AS forecaster,
c.name AS category,
f.forecast
FROM forecast f
JOIN forecaster fr ON f.forecaster_id = fr.id
RIGHT JOIN category c ON f.category_id = c.id
ORDER BY
c.id,
CASE WHEN fr.name = 'cramer' THEN 0
WHEN fr.name = 'sothebys' THEN 1
ELSE 2
END;
For each category, the first row in the ordering will be picked. Since Cramer has a higher id than Sotheby's, it will be given preference.
Adapt the ORDER BY clause if you need a more complicated ranking.

Select rows of multiple tables via joins

I have some tables which are related to each others.
A short demonstration:
Sites:
id | clip_id | article_id | unit_id
--------------+------------+--------
1 | 123 | 12 | 7
Clips:
id | title | desc |
------------+--------
1 | foo2 | abc1
Articles:
id | title | desc | slug
------------+---------------------
1 | foo2 | abc1 | article.html
Units:
id | vertical_id | title |
------------------+-------+
1 | 123 | abc |
Verticals:
id | name |
-----------+
1 | vfoo |
Now I want to do something like below:
SELECT ALL VERTICAL, UNIT, SITE, CLIP, ARTICLE attributes
from VERTICAL, UNIT, SITE, CLIP, ARTICLE TABLES
WHERE vertical_id = 2
Can some one help me how can I use joins for this?
Here is a running example of possibly what you want: http://sqlfiddle.com/#!15/af63b/2
select * from
sites
inner join units on sites.unit_id=units.id
inner join clips on clips.id=sites.clip_id
inner join articles on articles.id=sites.article_id
inner join verticals on verticals.id=units.vertical_id
where units.vertical_id=123
The problem is, that the description you gave us did not clearly specify which columns to join:
(answered) Why does units have a link to site via site_id and sites a link back to units via unit_id?
(answered) Why does units have a link to verticals via vertical_id and verticals a link back to units via unit_id?
I am guessing that your data does not giva a consistent example to get rows using the join. For vertical_id=123 there is no corresponding entry in verticals.
Edit:
I corrected the SQL due to corrections within the question. With this the two questions are answered.
select s.id, s.clip_id, s.article_id, u.title, u.vertical_id, c.title, v.unit_id, c.desc, a.slug
from sites s
join units u on s.id = u.id
join clips c on u.id = c.id
join verticals v on c.id = v.id
join articles a on v.id = a.id
where v.vertical_id = 'any id'