I'm trying to solve a slow query in PostgreSQL. I have a table "element" and a table "relation"
The table relation enables to put any items of the table "element" in relation with other items of the same table "element".
Another table "subtype" describes the type of the element. I list here only the most important columns for clarity.
Table: element(id, id_subtype, identification_number)
Table: relation(id, type_source, id_source, type_destination, id_destination)
Table: subtype(id, name, code)
I want to list all entries of the table "element" with the following columns:
Id, identification_number
a concatenated string of all its relations to other elements
a concatenated string of all its relations to other elements of the subtype with code = "zone"
a concatenated string of all its relations to other elements of the subtype with code = "secteur"
I have this query so far
SELECT
e.id, e.name,
string_agg(distinct(elem_identification_number), ', ') as rel_element_string,
string_agg(distinct(elem_zone_identification_number), ', ') as rel_zone_element_string,
string_agg(distinct(elem_sector_identification_number), ', ') as rel_sector_element_string
FROM(
SELECT e.id,
CASE
WHEN elem.id is null THEN null
ELSE concat(s.name, ' ', elem.identification_number)
END AS elem_identification_number,
CASE
WHEN s_zone.id is null THEN null
ELSE elem_zone.identification_number
END AS elem_zone_identification_number,
CASE
WHEN s_sector.id is null THEN null
ELSE elem_sector.identification_number
END AS elem_sector_identification_number
FROM element e
LEFT JOIN relation re ON re.id_source = e.id AND re.type_source = 'element' AND re.type_destination = 'element'
LEFT JOIN element elem ON re.id_destination = elem.id
LEFT JOIN subtype s ON elem.id_subtype = s.id
LEFT JOIN relation re_zone ON re_zone.id_source = e.id AND re_zone.type_source = 'element' AND re_zone.type_destination = 'element' AND re_zone.is_deleted = false
LEFT JOIN element elem_zone ON re_zone.id_destination = elem_zone.id
LEFT JOIN subtype s_zone ON elem_zone.id_subtype = s_zone.id AND s_zone.code = 'zone'
LEFT JOIN relation re_sector ON re_sector.id_source = e.id AND re_sector.type_source = 'element' AND re_sector.type_destination = 'element' AND re_sector.is_deleted = false
LEFT JOIN element elem_sector ON re_sector.id_destination = elem_sector.id
LEFT JOIN subtype s_sector ON elem_sector.id_subtype = s_sector.id AND s_sector.code = 'secteur'
WHERE e.is_deleted = false AND e.id_subtype = 18
UNION ALL
/* Same query but with reveresed id_source - id_destination */
) as e
GROUP BY id, e.identification_number, ...
ORDER BY id DESC";
The query plan of the full request (with all columns) looks like this with the "explain"
https://explain.depesz.com/s/Lk9h
I also have 2 indexes on table "relation"
CREATE INDEX idx_relation
ON public.relation USING btree
(id_chantier ASC NULLS LAST, type_source COLLATE pg_catalog."default" ASC NULLS LAST, id_source ASC NULLS LAST)
CREATE INDEX idx_relation_dest
ON public.relation USING btree
(id_chantier ASC NULLS LAST, type_destination COLLATE pg_catalog."default" ASC NULLS LAST, id_destination ASC NULLS LAST)
Any idea how I can improve the query?
Thank you!
You have a combinatorial explosion here. For example, if each of your string_aggs produces a list of a 100 things for each e, you first have a dataset of 100^3, or a million things, per e before the distinct compacts it back down again.
The way to avoid that is to not write one 10-way join, but rather write 3 correlated subqueries where each subquery has a 3-way join plus a reference to the outer table. Something like:
select e.*,
(select string_agg(...) from relation, element, subtype ...) rel_element_string,
(select string_agg(...) from relation, element, subtype ...) rel_zone_element_string,
(select string_agg(...) from relation, element, subtype ...) rel_sector_element_string
from elements e
WHERE e.is_deleted = false AND e.id_subtype = 18
There are three tables: businesses, categories, categorizations,
CREATE TABLE businesses (
id SERIAL PRIMARY KEY,
name varchar(40)
);
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name varchar(40)
);
CREATE TABLE categorizations (
business_id integer,
category_id integer
);
So business has many categories through categorizations.
If I want to select businesses without categories, I would do something
like this:
SELECT businesses.* FROM businesses
LEFT OUTER JOIN categorizations
ON categorizations.business_id = businesses.id
LEFT OUTER JOIN categories
ON categories.id = categorizations.category_id
GROUP BY businesses.id
HAVING count(categories.id) = 0;
The question is: How do I select businesses without categories AND
businesses with category named "Media" in one query?
You can use a union:
SELECT businesses.*
FROM businesses
LEFT OUTER JOIN categorizations
ON categorizations.business_id = businesses.id
GROUP BY businesses.id
HAVING count(categorizations.business_id) = 0
UNION
SELECT businesses.*
FROM businesses
INNER JOIN categorizations
ON categorizations.business_id = businesses.id
INNER JOIN categories
ON categories.id = categorizations.category_id
WHERE categories.name = 'Media';
Note that in the first instance (businesses with no categories at all) that you won't need to join as far as categories - you can detect the lack of category in the junction table. If it is possible for the same business to have the same category more than once, you'll need to introduce the second query with DISTINCT.
I would try:
SELECT b.* FROM businesses b
LEFT JOIN categorizations cz ON b.business_id = cz.business_id
LEFT JOIN categories cs ON cz.category_id = cs.category_id
WHERE COALESCE(cs.name, 'Media') = 'Media';
... in the hope that businesses with no categorizations would get NULL entries on their joins.
The double-negation trick works for this kind of selections:
SELECT * FROM businesses b
WHERE NOT EXISTS (
SELECT *
FROM categorizations bc
JOIN categories c ON bc.category_id = c.category_id
WHERE bc.business_id = b.business_id
AND c.name <> 'Media'
);
I am trying to create a country_name, and country cid pair between each country that are neighbours:
Here's the schema:
CREATE TABLE country (
cid INTEGER PRIMARY KEY,
cname VARCHAR(20) NOT NULL,
height INTEGER NOT NULL,
population INTEGER NOT NULL);
CREATE TABLE neighbour (
country INTEGER REFERENCES country(cid) ON DELETE RESTRICT,
neighbor INTEGER REFERENCES country(cid) ON DELETE RESTRICT,
length INTEGER NOT NULL,
PRIMARY KEY(country, neighbor));
My query:
create view neighbour_pair as (
select c1.cid, c1.cname, c2.cid, c2.cname
from neighbour n join country c1 on c1.cid = n.country
join country c2 on n.neighbor = c2.cid);
I am getting error code 42701 which means that there is a duplicate column.
The actual error message I am getting is:
ERROR: column "cid" specified more than once
********** Error **********
ERROR: column "cid" specified more than once
SQL state: 42701
I am unsure how to go around the error problem since I WANT the pair of neighbour countries with the country name and their cid.
Nevermind. I edited the first line of the query and changed the column names
create view neighbour_pair as
select c1.cid as c1cid, c1.cname as c1name, c2.cid as c2cid, c2.cname as c2name
from neighbour n join country c1 on c1.cid = n.country
join country c2 on n.neighbor = c2.cid;
I ran into a similar issue recently. I had a query like:
CREATE VIEW pairs AS
SELECT p.id, p.name,
(SELECT count(id) from results
where winner = p.id),
(SELECT count(id) from results
where winner = p.id OR loser = p.id)
FROM players p LEFT JOIN matches m ON p.id = m.id
GROUP BY 1,2;
The error was telling me: ERROR: column "count" specified more than once. The query WAS working via psycopg2, however when I brought it into a .sql file for testing the error arose.
I realized I just needed to alias the 2 count subqueries:
CREATE VIEW pairs AS
SELECT p.id, p.name,
(SELECT count(id) from results
where winner = p.id) as wins,
(SELECT count(id) from results
where winner = p.id OR loser = p.id) as matches
FROM players p LEFT JOIN matches m ON p.id = m.id
GROUP BY 1,2;
You can use alias with AS:
For example your view could be as follows:
create view neighbour_pair as
(
select c1.**cid**
, c1.cname
, c2.**cid AS cid_c2**
, c2.cname
from neighbour n
join country c1 on c1.cid = n.country
join country c2 on n.neighbor = c2.cid
);
I'm trying to map the results of a query to JSON using the row_to_json() function that was added in PostgreSQL 9.2.
I'm having trouble figuring out the best way to represent joined rows as nested objects (1:1 relations)
Here's what I've tried (setup code: tables, sample data, followed by query):
-- some test tables to start out with:
create table role_duties (
id serial primary key,
name varchar
);
create table user_roles (
id serial primary key,
name varchar,
description varchar,
duty_id int, foreign key (duty_id) references role_duties(id)
);
create table users (
id serial primary key,
name varchar,
email varchar,
user_role_id int, foreign key (user_role_id) references user_roles(id)
);
DO $$
DECLARE duty_id int;
DECLARE role_id int;
begin
insert into role_duties (name) values ('Script Execution') returning id into duty_id;
insert into user_roles (name, description, duty_id) values ('admin', 'Administrative duties in the system', duty_id) returning id into role_id;
insert into users (name, email, user_role_id) values ('Dan', 'someemail#gmail.com', role_id);
END$$;
The query itself:
select row_to_json(row)
from (
select u.*, ROW(ur.*::user_roles, ROW(d.*::role_duties)) as user_role
from users u
inner join user_roles ur on ur.id = u.user_role_id
inner join role_duties d on d.id = ur.duty_id
) row;
I found if I used ROW(), I could separate the resulting fields out into a child object, but it seems limited to a single level. I can't insert more AS XXX statements, as I think I should need in this case.
I am afforded column names, because I cast to the appropriate record type, for example with ::user_roles, in the case of that table's results.
Here's what that query returns:
{
"id":1,
"name":"Dan",
"email":"someemail#gmail.com",
"user_role_id":1,
"user_role":{
"f1":{
"id":1,
"name":"admin",
"description":"Administrative duties in the system",
"duty_id":1
},
"f2":{
"f1":{
"id":1,
"name":"Script Execution"
}
}
}
}
What I want to do is generate JSON for joins (again 1:1 is fine) in a way where I can add joins, and have them represented as child objects of the parents they join to, i.e. like the following:
{
"id":1,
"name":"Dan",
"email":"someemail#gmail.com",
"user_role_id":1,
"user_role":{
"id":1,
"name":"admin",
"description":"Administrative duties in the system",
"duty_id":1
"duty":{
"id":1,
"name":"Script Execution"
}
}
}
}
Update: In PostgreSQL 9.4 this improves a lot with the introduction of to_json, json_build_object, json_object and json_build_array, though it's verbose due to the need to name all the fields explicitly:
select
json_build_object(
'id', u.id,
'name', u.name,
'email', u.email,
'user_role_id', u.user_role_id,
'user_role', json_build_object(
'id', ur.id,
'name', ur.name,
'description', ur.description,
'duty_id', ur.duty_id,
'duty', json_build_object(
'id', d.id,
'name', d.name
)
)
)
from users u
inner join user_roles ur on ur.id = u.user_role_id
inner join role_duties d on d.id = ur.duty_id;
For older versions, read on.
It isn't limited to a single row, it's just a bit painful. You can't alias composite rowtypes using AS, so you need to use an aliased subquery expression or CTE to achieve the effect:
select row_to_json(row)
from (
select u.*, urd AS user_role
from users u
inner join (
select ur.*, d
from user_roles ur
inner join role_duties d on d.id = ur.duty_id
) urd(id,name,description,duty_id,duty) on urd.id = u.user_role_id
) row;
produces, via http://jsonprettyprint.com/:
{
"id": 1,
"name": "Dan",
"email": "someemail#gmail.com",
"user_role_id": 1,
"user_role": {
"id": 1,
"name": "admin",
"description": "Administrative duties in the system",
"duty_id": 1,
"duty": {
"id": 1,
"name": "Script Execution"
}
}
}
You will want to use array_to_json(array_agg(...)) when you have a 1:many relationship, btw.
The above query should ideally be able to be written as:
select row_to_json(
ROW(u.*, ROW(ur.*, d AS duty) AS user_role)
)
from users u
inner join user_roles ur on ur.id = u.user_role_id
inner join role_duties d on d.id = ur.duty_id;
... but PostgreSQL's ROW constructor doesn't accept AS column aliases. Sadly.
Thankfully, they optimize out the same. Compare the plans:
The nested subquery version; vs
The latter nested ROW constructor version with the aliases removed so it executes
Because CTEs are optimisation fences, rephrasing the nested subquery version to use chained CTEs (WITH expressions) may not perform as well, and won't result in the same plan. In this case you're kind of stuck with ugly nested subqueries until we get some improvements to row_to_json or a way to override the column names in a ROW constructor more directly.
Anyway, in general, the principle is that where you want to create a json object with columns a, b, c, and you wish you could just write the illegal syntax:
ROW(a, b, c) AS outername(name1, name2, name3)
you can instead use scalar subqueries returning row-typed values:
(SELECT x FROM (SELECT a AS name1, b AS name2, c AS name3) x) AS outername
Or:
(SELECT x FROM (SELECT a, b, c) AS x(name1, name2, name3)) AS outername
Additionally, keep in mind that you can compose json values without additional quoting, e.g. if you put the output of a json_agg within a row_to_json, the inner json_agg result won't get quoted as a string, it'll be incorporated directly as json.
e.g. in the arbitrary example:
SELECT row_to_json(
(SELECT x FROM (SELECT
1 AS k1,
2 AS k2,
(SELECT json_agg( (SELECT x FROM (SELECT 1 AS a, 2 AS b) x) )
FROM generate_series(1,2) ) AS k3
) x),
true
);
the output is:
{"k1":1,
"k2":2,
"k3":[{"a":1,"b":2},
{"a":1,"b":2}]}
Note that the json_agg product, [{"a":1,"b":2}, {"a":1,"b":2}], hasn't been escaped again, as text would be.
This means you can compose json operations to construct rows, you don't always have to create hugely complex PostgreSQL composite types then call row_to_json on the output.
I am adding this solution becasue the accepted response does not contemplate N:N relationships. aka: collections of collections of objects
If you have N:N relationships the clausula with it's your friend.
In my example, I would like to build a tree view of the following hierarchy.
A Requirement - Has - TestSuites
A Test Suite - Contains - TestCases.
The following query represents the joins.
SELECT reqId ,r.description as reqDesc ,array_agg(s.id)
s.id as suiteId , s."Name" as suiteName,
tc.id as tcId , tc."Title" as testCaseTitle
from "Requirement" r
inner join "Has" h on r.id = h.requirementid
inner join "TestSuite" s on s.id = h.testsuiteid
inner join "Contains" c on c.testsuiteid = s.id
inner join "TestCase" tc on tc.id = c.testcaseid
GROUP BY r.id, s.id;
Since you can not do multiple aggregations, you need to use "WITH".
with testcases as (
select c.testsuiteid,ts."Name" , tc.id, tc."Title" from "TestSuite" ts
inner join "Contains" c on c.testsuiteid = ts.id
inner join "TestCase" tc on tc.id = c.testcaseid
),
requirements as (
select r.id as reqId ,r.description as reqDesc , s.id as suiteId
from "Requirement" r
inner join "Has" h on r.id = h.requirementid
inner join "TestSuite" s on s.id = h.testsuiteid
)
, suitesJson as (
select testcases.testsuiteid,
json_agg(
json_build_object('tc_id', testcases.id,'tc_title', testcases."Title" )
) as suiteJson
from testcases
group by testcases.testsuiteid,testcases."Name"
),
allSuites as (
select has.requirementid,
json_agg(
json_build_object('ts_id', suitesJson.testsuiteid,'name',s."Name" , 'test_cases', suitesJson.suiteJson )
) as suites
from suitesJson inner join "TestSuite" s on s.id = suitesJson.testsuiteid
inner join "Has" has on has.testsuiteid = s.id
group by has.requirementid
),
allRequirements as (
select json_agg(
json_build_object('req_id', r.id ,'req_description',r.description , 'test_suites', allSuites.suites )
) as suites
from allSuites inner join "Requirement" r on r.id = allSuites.requirementid
)
select * from allRequirements
What it does is building the JSON object in small collection of items and aggregating them on each with clausules.
Result:
[
{
"req_id": 1,
"req_description": "<character varying>",
"test_suites": [
{
"ts_id": 1,
"name": "TestSuite",
"test_cases": [
{
"tc_id": 1,
"tc_title": "TestCase"
},
{
"tc_id": 2,
"tc_title": "TestCase2"
}
]
},
{
"ts_id": 2,
"name": "TestSuite",
"test_cases": [
{
"tc_id": 2,
"tc_title": "TestCase2"
}
]
}
]
},
{
"req_id": 2,
"req_description": "<character varying> 2 ",
"test_suites": [
{
"ts_id": 2,
"name": "TestSuite",
"test_cases": [
{
"tc_id": 2,
"tc_title": "TestCase2"
}
]
}
]
}
]
My suggestion for maintainability over the long term is to use a VIEW to build the coarse version of your query, and then use a function as below:
CREATE OR REPLACE FUNCTION fnc_query_prominence_users( )
RETURNS json AS $$
DECLARE
d_result json;
BEGIN
SELECT ARRAY_TO_JSON(
ARRAY_AGG(
ROW_TO_JSON(
CAST(ROW(users.*) AS prominence.users)
)
)
)
INTO d_result
FROM prominence.users;
RETURN d_result;
END; $$
LANGUAGE plpgsql
SECURITY INVOKER;
In this case, the object prominence.users is a view. Since I selected users.*, I will not have to update this function if I need to update the view to include more fields in a user record.