Indexes for optimising SQL Joins in Postgres

Indexes for optimising SQL Joins in Postgres - postgresql

Given the below query
SELECT * FROM A
INNER JOIN B ON (A.b_id = B.id)
WHERE (A.user_id = 'XXX' AND B.provider_id = 'XXX' AND A.type = 'PENDING')
ORDER BY A.created_at DESC LIMIT 1;
The variable values in the query are A.user_id and B.provider_id, the type is always queried on 'PENDING'.
I am planning to add a compound + partial index on A
A(user_id, created_at) where type = 'PENDING'
Also the number of records in A >> B.
Given A.user_id, B.provider_id, A.b_id all are foreign keys. Is there any way I can optimize the query?

Given that you are doing an inner join, I would first express the query as follows, with the join in the opposite direction:
SELECT *
FROM B
INNER JOIN A ON A.b_id = B.id
WHERE A.user_id = 'XXX' AND A.type = 'PENDING' AND
B.provider_id = 'XXX'
ORDER BY
A.created_at DESC
LIMIT 1;
Then I would add the following index to the A table:
CREATE INDEX idx_a ON A (user_id, type, created_at, b_id);
This four column index should cover the join from B to A, as well as the WHERE clause and also the ORDER BY sort at the end of the query. Note that we could probably also have left the query with the join order as you originally wrote above, and this index could still be used.

Related

How to get unique rows by one column but sort by the second

There is an example request in which there are several joins.
SELECT DISTINCT ON(a.id_1) 1, a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
ORDER BY a.id_1 desc
In this case, the query will work, sorting by unique values of id_1 will take place. But I need to sort by the column a.name. In this case, postresql will swear with the words ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions.
The following query can serve as a solution to the problem:
SELECT *
FROM(
SELECT DISTINCT ON(a.id_1) a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
)
ORDER_BY a.name desc
But in reality the database is very large and such a query is not optimal. Are there other ways to sort by the selected column while keeping one uniqueness?

Lateral query syntax

I'm trying to get lateral to work in a Postgres 9.5.3 query.
select b_ci."IdOwner",
ci."MinimumPlaces",
ci."MaximumPlaces",
(select count(*) from "LNK_Stu_CI" lnk
where lnk."FK_CourseInstanceId" = b_ci."Id") as "EnrolledStudents",
from "Course" c
join "DBObjectBases" b_c on c."Id" = b_c."Id"
join "DBObjectBases" b_ci on b_ci."IdOwner" = b_c."Id"
join "CourseInstance" ci on ci."Id" = b_ci."Id",
lateral (select ci."MaximumPlaces" - "EnrolledStudents") x
I want the right-most column to be the result of "MaximumPlaces" - "EnrolledStudents" for that row but am struggling to get it to work. At the moment PG is complaining that "EnrolledStudents" does not exist - which is exactly the point of "lateral", isn't it?
select b_ci."IdOwner",
ci."MinimumPlaces",
ci."MaximumPlaces",
(select count(*) from "LNK_Stu_CI" lnk
where lnk."FK_CourseInstanceId" = b_ci."Id") as "EnrolledStudents",
lateral (select "MaximumPlaces" - "EnrolledStudents") as "x"
from "Course" c
join "DBObjectBases" b_c on c."Id" = b_c."Id"
join "DBObjectBases" b_ci on b_ci."IdOwner" = b_c."Id"
join "CourseInstance" ci on ci."Id" = b_ci."Id"
If I try inlining the lateral clause (shown above) in the select it gets upset too and gives me a syntax error - so where does it go?
Thanks,
Adam.

You are missing the point with LATERAL. It can access columns in tables in the FROM clause, but not aliases defined in SELECT clause.
If you want to access alias defined in SELECT clause, you need to add another query level, either using a subquery in FROM clause (AKA derived table) or using a CTE (Common Table Expression). As CTE in PostgreSQL acts as an optimization fence, I strongly recommend going with subquery in this case, like:
select
-- get all columns on the inner query
t.*,
-- get your new expression based on the ones defined in the inner query
t."MaximumPlaces" - t."EnrolledStudents" AS new_alias
from (
select b_ci."IdOwner",
ci."MinimumPlaces",
ci."MaximumPlaces",
(select count(*) from "LNK_Stu_CI" lnk
where lnk."FK_CourseInstanceId" = b_ci."Id") as "EnrolledStudents",
from "Course" c
join "DBObjectBases" b_c on c."Id" = b_c."Id"
join "DBObjectBases" b_ci on b_ci."IdOwner" = b_c."Id"
join "CourseInstance" ci on ci."Id" = b_ci."Id"
) t

Row counts across aggregated tables

I have multiple tables in a postgres database that hold perfectly unique information. The information, when properly joined together in a query, will produce all every possible combination that I'm looking. The information I'm looking for are complete SKUs.
To generate a complete SKUs, this query produces the desired results:
Functional Query
SELECT
materials.code,
"part_base_parts".code as part_base_parts_id,
shanks.code AS shank_id,
measurements.description
FROM
"part_base_parts"
LEFT JOIN "part_types" ON "part_base_parts"."part_type_id" = "part_types"."id"
RIGHT JOIN "parts_to_shanks" ON "part_base_parts"."id" = "parts_to_shanks"."part_base_part_id"
RIGHT JOIN "parts_to_measurements" ON "part_base_parts"."id" = "parts_to_measurements"."part_base_part_id"
RIGHT JOIN "parts_to_materials" ON "part_base_parts"."id" = "parts_to_materials"."part_base_part_id"
JOIN materials ON "parts_to_materials"."material_id" = materials."id"
JOIN shanks ON "parts_to_shanks"."shank_id" = shanks."id"
JOIN measurements ON "parts_to_measurements"."measurement_id" = measurements."id"
ORDER BY
part_base_parts_id ASC,
materials.code ASC,
shank_id ASC,
measurements.description ASC
Given this query, I produce 32,640 records (without indexing applied) with a query time of .82 seconds. Something like this...
Given Output
code part_base_parts_id shank_id description
AA 5105 A 03.0
.
. 32,638 rows in here.
.
ST 6939 D 9/16
This is only getting me half way there, though. I need to take the results back from the query and produce the total number of counts from each column. So the result that I need to have would be:
Desired Results
code: AA - ###0
...
ST - ###0
part_base_parts_id: 5105 - ###0
...
6939 - ###0
shank_id: A - ###0
...
D - ###0
description: 03.0 - ###0
...
9/16 - ###0
Is there a way to produce the "desired results" from Postgres?

If you want them in rows then sure.
WITH cte AS(
SELECT
materials.code,
"part_base_parts".code as part_base_parts_id,
shanks.code AS shank_id,
measurements.description
FROM
"part_base_parts"
LEFT JOIN "part_types" ON "part_base_parts"."part_type_id" = "part_types"."id"
RIGHT JOIN "parts_to_shanks" ON "part_base_parts"."id" = "parts_to_shanks"."part_base_part_id"
RIGHT JOIN "parts_to_measurements" ON "part_base_parts"."id" = "parts_to_measurements"."part_base_part_id"
RIGHT JOIN "parts_to_materials" ON "part_base_parts"."id" = "parts_to_materials"."part_base_part_id"
JOIN materials ON "parts_to_materials"."material_id" = materials."id"
JOIN shanks ON "parts_to_shanks"."shank_id" = shanks."id"
JOIN measurements ON "parts_to_measurements"."measurement_id" = measurements."id"
ORDER BY
part_base_parts_id ASC,
materials.code ASC,
shank_id ASC,
measurements.description ASC
)
SELECT key, value, count(*)
FROM(
SELECT 'code' AS key, code AS value
FROM cte
UNION ALL
SELECT 'part_base_parts_id', code
FROM cte
UNION ALL
SELECT 'shank_id', shank_id
FROM cte
UNION ALL
SELECT 'description', description
FROM cte
) AS q
GROUP BY key, value
ORDER BY key, value

Postgres join not respecting outer where clause

In SQL Server, I know for sure that the following query;
SELECT things.*
FROM things
LEFT OUTER JOIN (
SELECT thingreadings.thingid, reading
FROM thingreadings
INNER JOIN things on thingreadings.thingid = things.id
ORDER BY reading DESC LIMIT 1) AS readings
ON things.id = readings.thingid
WHERE things.id = '1'
Would join against thingreadings only once the WHERE id = 1 had restricted the record set down. It left joins against just one row. However in order for performance to be acceptable in postgres, I have to add the WHERE id= 1 to the INNER JOIN things on thingreadings.thingid = things.id line too.
This isn't ideal; is it possible to force postgres to know that what I am joining against is only one row without explicitly adding the WHERE clauses everywhere?
An example of this problem can be seen here;
I am trying to recreate the following query in a more efficient way;
SELECT things.id, things.name,
(SELECT thingreadings.id FROM thingreadings WHERE thingid = things.id ORDER BY id DESC LIMIT 1),
(SELECT thingreadings.reading FROM thingreadings WHERE thingid = things.id ORDER BY id DESC LIMIT 1)
FROM things
WHERE id IN (1,2)
http://sqlfiddle.com/#!15/a172c/2

Not really sure why you did all that work. Isn't the inner query enough?
SELECT t.*
FROM thingreadings tr
INNER JOIN things t on tr.thingid = t.id AND t.id = '1'
ORDER BY tr.reading DESC
LIMIT 1;
sqlfiddle demo
When you want to select the latest value for each thingID, you can do:
SELECT t.*,a.reading
FROM things t
INNER JOIN (
SELECT t1.*
FROM thingreadings t1
LEFT JOIN thingreadings t2
ON (t1.thingid = t2.thingid AND t1.reading < t2.reading)
WHERE t2.thingid IS NULL
) a ON a.thingid = t.id
sqlfiddle demo
The derived table gets you the record with the most recent reading, then the JOIN gets you the information from things table for that record.

The where clause in SQL applies to the result set you're requesting, NOT to the join.
What your code is NOT saying: "do this join only for the ID of 1"...
What your code IS saying: "do this join, then pull records out of it where the ID is 1"...
This is why you need the inner where clause. Incidentally, I also think Filipe is right about the unnecessary code.

TSQL show only first row

I have the following TSQL query:
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John' ORDER BY MyTable1.Date DESC
It retrieves a long list of Dates, but I only need the first one, the one in the first row.
How can I get it?
Thanks a ton!

In SQL Server you can use TOP:
SELECT TOP 1 MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
ORDER BY MyTable1.Date DESC
If you need to use DISTINCT, then you can use:
SELECT TOP 1 x.Date
FROM
(
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
) x
ORDER BY x.Date DESC
Or even:
SELECT MAX(MyTable1.Date)
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
--ORDER BY MyTable1.Date DESC

There are several options here. You can use TOP(1) as Taryn mentioned. But according to docs for the purposes of limiting the rows returned it is better to use OFFSET and FETCH.
We recommend that you use the OFFSET and FETCH clauses instead of the TOP clause to implement a query paging solution and limit the number of rows sent to a client application.
Using OFFSET and FETCH as a paging solution requires running the query one time for each "page" of data returned to the client application. For example, to return the results of a query in 10-row increments, you must execute the query one time to return rows 1 to 10 and then run the query again to return rows 11 to 20 and so on.
Assuming, the solution for your problem using OFFSET and FETCH approach could be:
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John' ORDER BY MyTable1.Date DESC
OFFSET 0 ROWS
FETCH NEXT 1 ROW ONLY

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Indexes for optimising SQL Joins in Postgres - postgresql

Related

How to get unique rows by one column but sort by the second

Lateral query syntax

Row counts across aggregated tables

Postgres join not respecting outer where clause

TSQL show only first row

Categories

Resources