Postgresql, define limit within join? - postgresql

I have the following tables:
library
|id|...
books
|id|library_id (fk)|...
permissions
|user_id (fk)|library_id(fk)|read bool| ...
I want to find the most 10 recent books a certain user (id) can see in any library the user has read permissions on.
A library can have many books
A user can have one permissions record per library with read bool true or false
What I am not sure how to do is to limit the result of books per library they're in to a certain limit I want to set dynamically.
normally I would do this:
select b.id,
l.id
from book b
inner join library l on l.id = b.library_id
inner join permissions p on l.id = p.library_id
where p.user_id=${user.id} and p.read=true
order by b.created_at desc
I am not sure how to only return the most 10 (limit) recent books per library the user has access to.
How can I set the limit per library?

You would do a lateral join:
select b.id,
l.id
from permissions p
inner join library l on l.id = p.library_id
cross join lateral
(select * from book where library_id=l.id order by created_at desc LIMIT 10) b
where p.user_id=${user.id} and p.read=true

Related

Is it possible to do a "LIMIT 1" on a left join in Postgres?

I have two tables: one for money and attributes surrounding it (e.g. who earnt it) and a child table for the "ledger" - this contains one or more entries that represent the history of money that has moved.
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id
This query works well when there is only one ledger item, but when more are added the SUM will increase. I want to join only the latest row. So hypothetically:
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id ORDER BY pl.ts DESC LIMIT 1
WHERE ...
ORDER BY ...
LIMIT ...
(which sadly doesn't work)
What I have tried:
Using a subquery works, but is painfully slow given the size of the data set (and other omitted properties and where clauses etc.):
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id AND pl.id = (SELECT id FROM payout_ledgers WHERE payout_id = p.id ORDER BY ts DESC LIMIT 1)
Incidentally, I'm unsure why this subquery is so slow (~12 seconds, as opposed to 150ms with no subquery). I would have expected it to be quicker given that we're only selecting based on the foreign key (payout_id).
Another thing I tried was to do a select from the join - my logic being that if we select from small joined dataset instead of the whole table it would be quicker. However I was met with relation "pl" does not exist error:
SELECT SUM(pl.achieved)
FROM payouts p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id
WHERE pl.id = (SELECT id FROM pl ORDER BY ts DESC LIMIT 1)
Thank you in advance for any suggestions. I am also open to suggestions for schema changes that could make this type of logic easier, although my preference would be to try and get the query working since the schema is not easy to change on our production environment.
If you're on Postgres 9.4+, you can use a LEFT JOIN LATERAL (docs)
SELECT SUM(sub.achieved)
FROM payout p
LEFT JOIN LATERAL (SELECT achieved
FROM payout_ledgers pl
WHERE pl.payout_id = p.id
ORDER BY pl.ts DESC LIMIT 1) sub ON true
This will return the sum of the "achieved" field in the most recent entry in payout_ledgers for all payouts.
window functions:
-- using row_number()
SELECT SUM(sss.achieved)
FROM (SELECT pl.achieved
, row_number() OVER (PARTITION BY pl.payout_id, ORDER BY pl.ts DESC)
FROM payouts p
JOIN payout_ledgers pl ON pl.payout_id = p.id
) sss
WHERE sss.rn =1
;
-- using last_value()
SELECT SUM(sss.achieved)
FROM (SELECT
, last_value(achieved) OVER (PARTITION BY pl.payout_id, ORDER BY pl.ts ASC) AS achieved
FROM payouts p
JOIN payout_ledgers pl ON pl.payout_id = p.id
) sss
;
BTW: you do not need the LEFT JOIN (adding no value to the SUM does not change the sum)

Postgresql WHERE clause using conditional sub-queries

I have a situation where each of the clients has users and each user can access to information about one or more branches.
We also have sys admins who can see everything and in database don't have any sites assigned to them. It just says the user is sys admin, so our system does not restrict the access.
I need to make a database query where I extract the list of branches the user has access to, but if the user is sys admin, I want to extract the list of all branches in the system.
I was trying something like this, but it does not work:
Select sites.name, sites.id
FROM sites
WHERE
sites.id IN (
CASE
WHEN (select u.level FROM users "u" WHERE u.username = 'JohnBrown') ='ROLE_SYSTEM_ADMIN'
THEN
(select id FROM sites)
ELSE
(select s2.id FROM users_have_sites uhs2
left join users u2 ON u2.id = uhs2.user_id
left join sites s2 ON s2.id = uhs2.site_id
where u2.username = 'JonhBrown')
END
)
I am getting this error:
ERROR: more than one row returned by a subquery used as an expression
I think something like this would work for you:
SELECT s.name, s.id
FROM sites s
LEFT JOIN users_have_sites uhs ON uhs.site_id = s.id
LEFT JOIN users u ON u.id = uhs.user_id AND u.username = 'JohnBrown'
WHERE (CASE WHEN (SELECT u.level FROM users WHERE u.username = 'JohnBrown') = 'ROLE_SYSTEM_ADMIN'
THEN TRUE ELSE FALSE END
OR u.id IS NOT NULL);
The LEFT JOINs do not filter out records from the sites table like an INNER JOIN would, so any site that meets either of the conditions in the WHERE clause will be in the result. This means that if your subquery shows that the user is a sys admind or if there is a record for that user and site is found in the users_have_sites table, those sites will be in the result set.
EDIT: Another fairly easy to read solution would be something like this:
SELECT s.name, s.id
FROM sites s,
users_have_sites uhs,
users u
WHERE u.username = 'JohnBrown'
AND (u.level = 'ROLE_SYSTEM_ADMIN'
OR (s.id = uhs.site_id AND u.id = uhs.user_id))
GROUP BY s.name, s.id;
The downside of this query is that it uses implicit joins which are not used very much any more. They are generally seen as an older way of doing things and can be less efficient. This will join all rows of on table to all rows of another table and then all of your filtering (and what you would generally think of as join conditions) are all in the WHERE clause. These typed of joins can be less efficient but this one should not be as the WHERE clause makes sure that only 1 result per site.
I think that this does what you want:
select s.name, s.id
from sites s
inner join users u on u.username = 'JohnBrown'
where
u.level = 'ROLE_SYSTEM_ADMIN'
or exists (
select 1
from users_have_sites uhs
where uhs.site_id = s.id and uhs.user_id = u.id
)
Here is another version of the query that you may find easier to follow (I do):
select s.name, s.id
from users u
inner join sites s
on u.level = 'ROLE_SYSTEM_ADMIN'
or exists (
select 1
from users_have_sites uhs
where uhs.site_id = s.id and uhs.user_id = u.id
)
where u.username = 'JohnBrown'

Get distinct row by primary key, but use value from another column

I'm trying to get the sum of the total time that was spent sending all emails within a campaign.
Because of the joins in my query I end up with the 'processing_time' column duplicated over many rows. So running sum(s.processing_time) as send_time will always over represent how long it took to run.
select
c.id,
c.sender,
c.subject,
count(*) as total_items,
count(distinct s.id) as sends,
sum(s.processing_time) as send_time,
from campaigns c
left join sends s on c.id = s.campaigns_id
left join opens o on s.id = o.sends_id
group by c.id;
I'd ideally like to do something like sum(s.processing_time when distinct s.id) but I can't quite work out how to achieve that.
I have made other attempts using case but I always run into the same issue, I need to get the distinct rows based on the ID column, but work with another column.
Since you want statistics related to distinct s.id as well as c.id, group by both columns. Collect the (intermediate) data that you need,
and use this table as the inner table in a nested sub-select query.
In the outer select, group by c.id alone.
Since the inner select groups by s.id, values which are unique per s.id will not get double-counted when you sum/group by c.id.
SELECT id
, sender
, subject
, sum(total_items) as total_items
, sum(sends) as sends
, sum(processing_time) as send_time
FROM (
SELECT
c.id
, s.id as sid
, count(*) as total_items
, 1 as sends
, s.processing_time
, c.sender
, c.subject
FROM campaigns c
LEFT JOIN sends s on c.id = s.campaigns_id
LEFT JOIN opens o on s.id = o.sends_id
GROUP BY c.id, c.sender, c.subject, s.processing_time, s.id) t
GROUP BY id, sender, subject
ORDER BY id
Since the final table includes sender and subject, you'll need to group by these columns as well to avoid an error such as:
ERROR: column "c.sender" must appear in the GROUP BY clause or be used in an aggregate function
LINE 14: , c.sender

Get Greatest date across multiple columns with entity framework

I have three entities: Group, Activity, and Comment. Each entity is represented in the db as a table. A Group has many Activities, and an Activity has many comments. Each entity has a CreatedDate field. I need to query for all groups + the CreatedDate of the most recent entity created on that Group's object graph.
I've constructed a sql query that gives me what I need, but I'm not sure how to do this in entity framework. Specifically this line: (SELECT MAX(X)
FROM (VALUES (g.CreatedDate), (a.CreatedDate), (c.CreatedDate)) Thanks in advance for your help. Here's the full query:
WITH GroupWithLastActivityDate AS (
SELECT DISTINCT
g.Id
,g.GroupName
,g.GroupDescription
,g.CreatedDate
,g.ApartmentComplexId
,(SELECT MAX(X)
FROM (VALUES (g.CreatedDate), (a.CreatedDate), (c.CreatedDate)) AS AllDates(X)) AS LastActivityDate
FROM Groups g
LEFT OUTER JOIN Activities a
on g.Id = a.GroupId
LEFT OUTER JOIN Comments c
on a.Id = c.ActivityId
WHERE g.IsActive = 1
)
SELECT
GroupId = g.Id
,g.GroupName
,g.GroupDescription
,g.ApartmentComplexId
,NumberOfActivities = COUNT(DISTINCT a.Id)
,g.CreatedDate
,LastActivityDate = Max(g.LastActivityDate)
FROM GroupWithLastActivityDate g
INNER JOIN Activities a
on g.Id = a.GroupId
WHERE a.IsActive = 1
GROUP BY g.Id
,g.GroupName
,g.GroupDescription
,g.CreatedDate
,g.ApartmentComplexId
I should add that for now I've constructed a view with this query (plus some other stuff) which I'm querying with a SqlQuery.

MS Access INNER JOIN most recent entry

I'm having some trouble trying to get Microsoft Access 2007 to accept my SQL query but it keeps throwing syntax errors at me that don't help me correct the problem.
I have two tables, let's call them Customers and Orders for ease.
I need some customer details, but also a few details from the most recent order. I currently have a query like this:
SELECT c.ID, c.Name, c.Address, o.ID, o.Date, o.TotalPrice
FROM Customers c
INNER JOIN Orders o
ON c.ID = o.CustomerID
AND o.ID = (SELECT TOP 1 ID FROM Orders WHERE CustomerID = c.ID ORDER BY Date DESC)
To me, it appears valid, but Access keeps throwing 'syntax error's at me and when I hit OK, it selects a piece of the SQL text that doesn't even relate to it.
If I take the extra SELECT clause out it works but is obviously not what I need.
Any ideas?
You cannot use AND in that way in MS Access, change it to WHERE. In addition, you have two reserved words in your column (field) names - Name, Date. These should be enclosed in square brackets when not prefixed by a table name or alias, or better, renamed.
SELECT c.ID, c.Name, c.Address, o.ID, o.Date, o.TotalPrice
FROM Customers c
INNER JOIN Orders o
ON c.ID = o.CustomerID
WHERE o.ID = (
SELECT TOP 1 ID FROM Orders
WHERE CustomerID = c.ID ORDER BY [Date] DESC)
I worked out how to do it in Microsoft Access. You INNER JOIN on a pre-sorted sub-query. That way you don't have to do multiple ON conditions which aren't supported.
SELECT c.ID, c.Name, c.Address, o.OrderNo, o.OrderDate, o.TotalPrice
FROM Customers c
INNER JOIN (SELECT * FROM Orders ORDER BY OrderDate DESC) o
ON c.ID = o.CustomerID
How efficient this is another story, but it works...