Performing a batch select latest n in Postgres? - postgresql

Suppose I have query for fetching the latest 10 books for a given author like this:
SELECT *
FROM books
WHERE author_id = #author_id
ORDER BY published DESC, id
LIMIT 10
Now if I have a list of n authors I want to get the latest books for, then I can run this query n times. Note that n is reasonably small. However, this seems like an optimization opportunity.
Is there are single query that can efficiently fetch the latest 10 books for n given authors?
This query doesn't work (only fetches 10, not n * 10 books):
SELECT *
FROM books
WHERE author_id = ANY(#author_ids)
ORDER BY published DESC, id
LIMIT 10

First provided author wise book where book is serialized by recent published date for generating a number using ROW_NUMBER() and then in outer subquery add a condition for fetching the desired result.
SELECT *
FROM (SELECT *
, ROW_NUMBER() OVER (PARTITION BY author_id ORDER BY published DESC) row_num
FROM books
WHERE author_id = ANY(#author_ids)) t
WHERE t.row_num <= 10

SELECT b.*
FROM authors a
JOIN LATERAL (
SELECT *
FROM books b
WHERE b.author = a.id
ORDER BY b.published DESC, b.id
LIMIT 10
) b ON TRUE
WHERE a.id = ANY(#author_ids)

Related

How to retrieve the N first rows AND the N last rows in only one request?

Let's say we have a huge query like this:
SELECT id, quality FROM products ORDER BY quality
Is it possible to retrieve the N first rows AND the N last rows of the results, without performing two requests ?
What I want to avoid (two requests):
SELECT id, quality FROM products ORDER BY quality LIMIT 5;
SELECT id, quality FROM products ORDER BY quality DESC LIMIT 5;
Context: the actual request is very CPU/time consuming, that's why I want to limit to one request if possible.
Using a WITH clause to avoid writing the same code twice:
WITH my_complex_query AS (
SELECT * FROM table_name
)
(SELECT * FROM my_complex_query ORDER BY id ASC LIMIT 5)
UNION ALL
(SELECT * FROM my_complex_query ORDER BY id DESC LIMIT 5)
(SELECT * FROM table_name LIMIT 5) UNION (SELECT * FROM table_name ORDER BY id DESC LIMIT 5);

Inner join removed from the SQL query

I have a below SQL query to get the three records for notifying purpose.
SELECT orders.msg
FROM orders
INNER JOIN
(
SELECT id
FROM orders
WHERE type_id = 12
ORDER BY id DESC LIMIT 3 OFFSET 0
) AS items
ON orders.id = items.id;
When trying to make the query optimized, i made the changes as below.
SELECT orders.msg
FROM orders
WHERE type_id = 12
ORDER BY id DESC LIMIT 3 OFFSET 0;
Is the modified query seems to be OK or did i miss anything here or any other way of doing is there??
The simplified version on the bottom looks logically identical, to me, to the one on top:
SELECT msg
FROM orders
WHERE type_id = 12
ORDER BY id DESC LIMIT 3;
Note that the above query could benefit from the following index:
CREATE INDEX idx ON orders (type_id, id, msg);
This index would completely cover the WHERE, ORDER BY, and SELECT clauses.
You can try this also:
SELECT orders.msg
FROM orders
WHERE orders.id
IN (
SELECT id
FROM orders
WHERE type_id = 12
ORDER BY id
DESC LIMIT 3 OFFSET 0
)

How to filter database table by a multiple join records from another one table but different types?

I have a products table and corresponding ratings table which contains a foreign key product_id, grade(int) and type which is an enum accepting values robustness and price_quality_ratio
The grades accept values from 1 to 10. So for example, how would the query look like, if I wanted to filter the products where minimum grade for robustness would be 7 and minimum grade for price_quality_ratio would be 8?
You can join twice, once per rating. The inner joins eliminate the products that fail any rating criteria,
select p.*
from products p
inner join rating r1
on r1.product_id = p.product_id
and r1.type = 'robustness'
and r1.rating >= 7
inner join rating r2
on r2.product_id = p.product_id
and r2.type = 'price_quality_ratio'
and r2.rating >= 8
Another option is to use do conditional aggregation. This requires only one join, then a group by; the rating criteria are checked in the having clause.
select p.product_id, p.product_name
from products p
inner join rating r
on r.product_id = p.product_id
and r.type in ('robustness', 'price_quality_ratio')
group by p.product_id, p.product_name
having
min(case when r.type = 'robustness' then r.rating end) >= 7
and min(case when r.type = 'price_quality_ratio then r.rating end) >= 8
The JOIN proposed by #GMB would've been my first suggestion as well. If that gets too complicated with having to maintain too many rX.ratings, you can also use a nested query:
SELECT *
FROM (
SELECT p.*, r1.rating as robustness, r2.rating as price_quality_ratio
FROM products p
JOIN rating r1 ON (r1.product_id = p.product_id AND r1.type = 'robustness')
JOIN rating r2 ON (r2.product_id = p.product_id AND r2.type = 'price_quality_ratio')
) AS tmp
WHERE robustness >= 7
AND price_quality_ratio >= 8
-- ORDER BY (price_quality_ratio DESC, robustness DESC) -- etc

How do I do LIMIT within GROUP in the same table?

I can't figure out how to do limit within group although I've read all similar questions here. Reading PSQL doc didn't help either :( Consider the following:
CREATE TABLE article_relationship
(
article_from INT NOT NULL,
article_to INT NOT NULL,
score INT
);
I want to get a list of top 5 related articles per given article IDs sorted by score.
Here is what I tried:
select DISTINCT o.article_from
from article_relationship o
join lateral (
select i.article_from, i.article_to, i.score from article_relationship i
order by score desc
limit 5
) p on p.article_from = o.article_from
where o.article_from IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.article_from;
And it returns nothing. I was under impression that outer query is like loop so I guess I only need source IDs there.
Also what if I want to join on articles table where there are columns id and title and get titles of related articles in resultset?
I added join in inner query:
select o.id, p.*
from articles o
join lateral (
select a.title, i.article_from, i.article_to, i.score
from article_relationship i
INNER JOIN articles a on a.id = i.article_to
where i.article_from = o.id
order by score desc
limit 5
) p on true
where o.id IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.id;
But it made it very very slow.
The problem with no rows returning from your query is that your join condition is wrong: ON p.article_from = o.article_from; this should obviously be ON p.article_from = o.article_to.
That issue aside, your query will not return the top 5 scoring relations per article id; instead it will return the article IDs that reference one of the 5 top rated referenced articles throughout the table and (also) at least 1 of the 5 referenced articles for which you specify the id.
You can get the top 5 rated referenced articles per referencing article with a window function to rank the scores in a sub-select and then select only the top 5 in the main query. Specifying a list of referenced article IDs effectively means that you will rank how these referenced articles are scored for each referencing article:
SELECT article_from, article_to, score
FROM (
SELECT article_from, article_to, score,
rank() OVER (PARTITION BY article_from ORDER BY score DESC) AS rnk
FROM article_relationship
WHERE article_to IN (18329382, 61913904, 66538293, 66540477, 66496909) ) a
WHERE rnk < 6
ORDER BY article_from, score DESC;
This is different from your code in that it returns up to 5 records for each article_from but it is consistent with your initial description.
Adding columns from table articles is trivially done in the main query:
SELECT a.article_from, a.article_to, a.score, articles.*
FROM (
SELECT article_from, article_to, score,
rank() OVER (PARTITION BY article_from ORDER BY score DESC) AS rnk
FROM article_relationship
WHERE article_to IN (18329382, 61913904, 66538293, 66540477, 66496909) ) a
JOIN articles ON articles.id = a.article_to
WHERE a.rnk < 6
ORDER BY a.article_from, a.score DESC;
Version with join lateral
select o.id as from_id, p.article_to as to_id, a.title, a.journal_id, a.pub_date_p from articles o
join lateral (
select i.article_to from article_relationship i
where i.article_from = o.id
order by score desc
limit 5
) p on true
INNER JOIN articles a on a.id = p.article_to
where o.id IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.id;

TSQL show only first row

I have the following TSQL query:
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John' ORDER BY MyTable1.Date DESC
It retrieves a long list of Dates, but I only need the first one, the one in the first row.
How can I get it?
Thanks a ton!
In SQL Server you can use TOP:
SELECT TOP 1 MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
ORDER BY MyTable1.Date DESC
If you need to use DISTINCT, then you can use:
SELECT TOP 1 x.Date
FROM
(
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
) x
ORDER BY x.Date DESC
Or even:
SELECT MAX(MyTable1.Date)
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
--ORDER BY MyTable1.Date DESC
There are several options here. You can use TOP(1) as Taryn mentioned. But according to docs for the purposes of limiting the rows returned it is better to use OFFSET and FETCH.
We recommend that you use the OFFSET and FETCH clauses instead of the TOP clause to implement a query paging solution and limit the number of rows sent to a client application.
Using OFFSET and FETCH as a paging solution requires running the query one time for each "page" of data returned to the client application. For example, to return the results of a query in 10-row increments, you must execute the query one time to return rows 1 to 10 and then run the query again to return rows 11 to 20 and so on.
Assuming, the solution for your problem using OFFSET and FETCH approach could be:
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John' ORDER BY MyTable1.Date DESC
OFFSET 0 ROWS
FETCH NEXT 1 ROW ONLY