I'm trying to solve a slow query in PostgreSQL. I have a table "element" and a table "relation"
The table relation enables to put any items of the table "element" in relation with other items of the same table "element".
Another table "subtype" describes the type of the element. I list here only the most important columns for clarity.
Table: element(id, id_subtype, identification_number)
Table: relation(id, type_source, id_source, type_destination, id_destination)
Table: subtype(id, name, code)
I want to list all entries of the table "element" with the following columns:
Id, identification_number
a concatenated string of all its relations to other elements
a concatenated string of all its relations to other elements of the subtype with code = "zone"
a concatenated string of all its relations to other elements of the subtype with code = "secteur"
I have this query so far
SELECT
e.id, e.name,
string_agg(distinct(elem_identification_number), ', ') as rel_element_string,
string_agg(distinct(elem_zone_identification_number), ', ') as rel_zone_element_string,
string_agg(distinct(elem_sector_identification_number), ', ') as rel_sector_element_string
FROM(
SELECT e.id,
CASE
WHEN elem.id is null THEN null
ELSE concat(s.name, ' ', elem.identification_number)
END AS elem_identification_number,
CASE
WHEN s_zone.id is null THEN null
ELSE elem_zone.identification_number
END AS elem_zone_identification_number,
CASE
WHEN s_sector.id is null THEN null
ELSE elem_sector.identification_number
END AS elem_sector_identification_number
FROM element e
LEFT JOIN relation re ON re.id_source = e.id AND re.type_source = 'element' AND re.type_destination = 'element'
LEFT JOIN element elem ON re.id_destination = elem.id
LEFT JOIN subtype s ON elem.id_subtype = s.id
LEFT JOIN relation re_zone ON re_zone.id_source = e.id AND re_zone.type_source = 'element' AND re_zone.type_destination = 'element' AND re_zone.is_deleted = false
LEFT JOIN element elem_zone ON re_zone.id_destination = elem_zone.id
LEFT JOIN subtype s_zone ON elem_zone.id_subtype = s_zone.id AND s_zone.code = 'zone'
LEFT JOIN relation re_sector ON re_sector.id_source = e.id AND re_sector.type_source = 'element' AND re_sector.type_destination = 'element' AND re_sector.is_deleted = false
LEFT JOIN element elem_sector ON re_sector.id_destination = elem_sector.id
LEFT JOIN subtype s_sector ON elem_sector.id_subtype = s_sector.id AND s_sector.code = 'secteur'
WHERE e.is_deleted = false AND e.id_subtype = 18
UNION ALL
/* Same query but with reveresed id_source - id_destination */
) as e
GROUP BY id, e.identification_number, ...
ORDER BY id DESC";
The query plan of the full request (with all columns) looks like this with the "explain"
https://explain.depesz.com/s/Lk9h
I also have 2 indexes on table "relation"
CREATE INDEX idx_relation
ON public.relation USING btree
(id_chantier ASC NULLS LAST, type_source COLLATE pg_catalog."default" ASC NULLS LAST, id_source ASC NULLS LAST)
CREATE INDEX idx_relation_dest
ON public.relation USING btree
(id_chantier ASC NULLS LAST, type_destination COLLATE pg_catalog."default" ASC NULLS LAST, id_destination ASC NULLS LAST)
Any idea how I can improve the query?
Thank you!
You have a combinatorial explosion here. For example, if each of your string_aggs produces a list of a 100 things for each e, you first have a dataset of 100^3, or a million things, per e before the distinct compacts it back down again.
The way to avoid that is to not write one 10-way join, but rather write 3 correlated subqueries where each subquery has a 3-way join plus a reference to the outer table. Something like:
select e.*,
(select string_agg(...) from relation, element, subtype ...) rel_element_string,
(select string_agg(...) from relation, element, subtype ...) rel_zone_element_string,
(select string_agg(...) from relation, element, subtype ...) rel_sector_element_string
from elements e
WHERE e.is_deleted = false AND e.id_subtype = 18
Related
Lets say I have
sequelize.query('SELECT associations FROM users WHERE id = :id')
associations is a JSONB ARRAY column
the output look like so
[
{
"role": 2,
"shop_id": 1,
"admin_id": 1,
"manager_id": null
}
]
I'd like to loop through the array and search for those associations using those ids
I'd like to perfom that whole thing in the same query.
I have a role table, shop table, users table
Progress
all the columns are coming out as null
If association is a column of type jsonb[], then use unnest(association) in order to expand the first level of elements.
Then you can try something like this assuming that all the id are of type integer :
sequelize.query('
SELECT *
FROM users
CROSS JOIN LATERAL unnest(associations) AS j
LEFT JOIN role AS r
ON (j->>\'role\') :: integer = r.id
LEFT JOIN shop AS s
ON (j->>\'shop_id\') :: integer = s.id
LEFT JOIN users AS a
ON (j->>\'admin_id\') :: integer = a.id
LEFT JOIN users AS m
ON (j->>\'manager_id\') :: integer = m.id
WHERE id = :id'
)
Given the below query
SELECT * FROM A
INNER JOIN B ON (A.b_id = B.id)
WHERE (A.user_id = 'XXX' AND B.provider_id = 'XXX' AND A.type = 'PENDING')
ORDER BY A.created_at DESC LIMIT 1;
The variable values in the query are A.user_id and B.provider_id, the type is always queried on 'PENDING'.
I am planning to add a compound + partial index on A
A(user_id, created_at) where type = 'PENDING'
Also the number of records in A >> B.
Given A.user_id, B.provider_id, A.b_id all are foreign keys. Is there any way I can optimize the query?
Given that you are doing an inner join, I would first express the query as follows, with the join in the opposite direction:
SELECT *
FROM B
INNER JOIN A ON A.b_id = B.id
WHERE A.user_id = 'XXX' AND A.type = 'PENDING' AND
B.provider_id = 'XXX'
ORDER BY
A.created_at DESC
LIMIT 1;
Then I would add the following index to the A table:
CREATE INDEX idx_a ON A (user_id, type, created_at, b_id);
This four column index should cover the join from B to A, as well as the WHERE clause and also the ORDER BY sort at the end of the query. Note that we could probably also have left the query with the join order as you originally wrote above, and this index could still be used.
There is an example request in which there are several joins.
SELECT DISTINCT ON(a.id_1) 1, a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
ORDER BY a.id_1 desc
In this case, the query will work, sorting by unique values of id_1 will take place. But I need to sort by the column a.name. In this case, postresql will swear with the words ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions.
The following query can serve as a solution to the problem:
SELECT *
FROM(
SELECT DISTINCT ON(a.id_1) a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
)
ORDER_BY a.name desc
But in reality the database is very large and such a query is not optimal. Are there other ways to sort by the selected column while keeping one uniqueness?
Need some help working out the SQL. Unfortunately the version of tsql is SybaseASE which I'm not too familiar with, in MS SQL I would use a windowed function like RANK() or ROW_NUMBER() in a subquery and join to those results ...
Here's what I'm trying to resolve
TABLE A
Id
1
2
3
TABLE B
Id,Type
1,A
1,B
1,C
2,A
2,B
3,A
3,C
4,B
4,C
I would like to return 1 row for each ID and if the ID has a type 'A' record that should display, if it has a different type then it doesn't matter but it cannot be null (can do some arbitrary ordering, like alpha to prioritize "other" return value types)
Results:
1, A
2, A
3, A
4, B
A regular left join (ON A.id = B.id and B.type = 'A') ALMOST returns what I am looking for however it returns null for the type when I want the 'next available' type.
You can use a INNER JOIN on a SubQuery (FirstTypeResult) that will return the minimum type per Id.
Eg:
SELECT TABLEA.[Id], FirstTypeResult.[Type]
FROM TABLEA
JOIN (
SELECT [Id], Min([Type]) As [Type]
FROM TABLEB
GROUP BY [Id]
) FirstTypeResult ON FirstTypeResult.[Id] = TABLEA.[Id]
I am trying to create a country_name, and country cid pair between each country that are neighbours:
Here's the schema:
CREATE TABLE country (
cid INTEGER PRIMARY KEY,
cname VARCHAR(20) NOT NULL,
height INTEGER NOT NULL,
population INTEGER NOT NULL);
CREATE TABLE neighbour (
country INTEGER REFERENCES country(cid) ON DELETE RESTRICT,
neighbor INTEGER REFERENCES country(cid) ON DELETE RESTRICT,
length INTEGER NOT NULL,
PRIMARY KEY(country, neighbor));
My query:
create view neighbour_pair as (
select c1.cid, c1.cname, c2.cid, c2.cname
from neighbour n join country c1 on c1.cid = n.country
join country c2 on n.neighbor = c2.cid);
I am getting error code 42701 which means that there is a duplicate column.
The actual error message I am getting is:
ERROR: column "cid" specified more than once
********** Error **********
ERROR: column "cid" specified more than once
SQL state: 42701
I am unsure how to go around the error problem since I WANT the pair of neighbour countries with the country name and their cid.
Nevermind. I edited the first line of the query and changed the column names
create view neighbour_pair as
select c1.cid as c1cid, c1.cname as c1name, c2.cid as c2cid, c2.cname as c2name
from neighbour n join country c1 on c1.cid = n.country
join country c2 on n.neighbor = c2.cid;
I ran into a similar issue recently. I had a query like:
CREATE VIEW pairs AS
SELECT p.id, p.name,
(SELECT count(id) from results
where winner = p.id),
(SELECT count(id) from results
where winner = p.id OR loser = p.id)
FROM players p LEFT JOIN matches m ON p.id = m.id
GROUP BY 1,2;
The error was telling me: ERROR: column "count" specified more than once. The query WAS working via psycopg2, however when I brought it into a .sql file for testing the error arose.
I realized I just needed to alias the 2 count subqueries:
CREATE VIEW pairs AS
SELECT p.id, p.name,
(SELECT count(id) from results
where winner = p.id) as wins,
(SELECT count(id) from results
where winner = p.id OR loser = p.id) as matches
FROM players p LEFT JOIN matches m ON p.id = m.id
GROUP BY 1,2;
You can use alias with AS:
For example your view could be as follows:
create view neighbour_pair as
(
select c1.**cid**
, c1.cname
, c2.**cid AS cid_c2**
, c2.cname
from neighbour n
join country c1 on c1.cid = n.country
join country c2 on n.neighbor = c2.cid
);