Postgres: make repeated subqueries more efficient?

Postgres: make repeated subqueries more efficient? - postgresql

I have a Postgres 9.6 database with two tables, template and project.
template
id integer
name varchar
project
id integer
name varchar
template_id integer (foreign key)
is_deleted boolean
is_listed boolean
I want to get a list of all templates, with a count of the projects for each template, and a count of the deleted projects for each template, i.e. this type of output
id,name,num_projects,num_deleted,num_listed
1,"circle",19,2,7
2,"square",10,0,8
I have a query like this:
select id, name,
(select count(*) from project where template_id=template.id)
as num_projects,
(select count(*) from project where template_id=template.id and is_deleted)
as num_deleted,
(select count(*) from project where template_id=template.id and is_listed)
as num_listed
from template;
However, looking at the EXPLAIN, this isn't very efficient as the large project table is queried separately three times.
Is there any way to get Postgres to query and iterate over the project table just once?

The query could be rewritten as:
SELECT t.id, t.name,
COUNT(p.template_id) as num_projects,
COUNT(p.template_id) FILTER(WHERE p.is_deleted) as num_deleted,
COUNT(p.template_id) FILTER(WHERE p.is_listed) as num_listed
FROM template t
LEFT JOIN project p
ON p.template_id=t.id
GROUP BY t.id, t.name

Sometimes, doing the aggregation before joining is more efficient then aggregating the result of a join.
SELECT t.id, t.name,
coalesce(p.num_projects, 0) as num_projects,
coalesce(p.num_deleted, 0) as num_deleted,
coalesce(p.num_listed, 0) as num_listed
FROM template t
LEFT JOIN (
SELECT template_id,
count(*) as num_projects
count(*) filter (where p.is_deleted) as num_deleted,
count(*) filter (where p.is_listed) as num_listed
FROM project
GROUP BY template_id
) p ON p.template_id = t.id

Related

Add Column in table with value partition by group

My table is somethingg like
CREATE TABLE table1
(
_id text,
name text,
data_type int,
data_value int,
data_date timestamp -- insertion time
);
Now due to a system bug, many duplicate entries are created and I need to remove those duplicated and keep only unique entries excluding data_date because it is a system generated date.
My query to do that is something like:
DELETE FROM table1 A
USING ( SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value
HAVING count(data_date) > 1) B
WHERE A._id = B._id
AND A.name = B.name
AND A.data_type = B.data_type
AND A.data_value = B.data_value
AND A.data_date != B.min_date;
However this query works, having millions of records in the table, I want a faster way for it. My idea is to create a new column with value as partition by [_id, name, data_type, data_value] or columns which are in group by. However, I could not find the way to create such column.
I would appretiate if any one may suggest a way to create such column.
Edit 1:
There is another thing to add, I don't want to use CTE or subquery for updating this new column because it will be same as my existing query.

The best way is simply creating a new table without duplicated records:
CREATE...
SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value;
Alternatively, you can create a rank and then filter, but a subquery is needed.
RANK() OVER (PARTITION BY your_variables ORDER BY data_date ASC) r
And then filter r=1.

How to use OPENJSON on multiple rows

I have a temp table with multiple rows in it and each row has a column called Categories; which contains a very simple json array of ids for categories in a different table.
A few example rows of the temp table:
Id Name Categories
---------------------------------------------------------------------------------------------
'539f7e28-143e-41bb-8814-a7b93b846007' Test 1 ["category1Id", "category2Id", "category3Id"]
'f29e2ecf-6e37-4aa9-aa56-4a351d298bfc' Test 2 ["category1Id", "category2Id"]
'34e41a0a-ad92-4cd7-bf5c-8df6bfd6ed5c' Test 3 NULL
Now what I would like to do is to select all of the category ids from all of the rows in the temp table.
What I have is the following and it's not working as it's giving me the error of :
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
SELECT
c.Id
,c.[Name]
,c.Color
FROM
dbo.Category as c
WHERE
c.Id in (SELECT [value] FROM OPENJSON((SELECT Categories FROM #TempTable)))
and c.IsDeleted = 0
Which I guess it makes sense that's failing on that because I'm selecting multiple rows and needing to parse each row's respective category ids json. I'm just not sure what to do/change to give me the results that I want. Thank you in advance for any help.

You'd need to use CROSS APPLY like so:
SELECT id ,
name ,
t.Value AS category_id
FROM #temp
CROSS APPLY OPENJSON(categories, '$') t;
And then, you can JOIN to your Categories table using the category_id column, something like this:
SELECT id ,
name ,
t.Value AS category_id,
c.*
FROM #temp
CROSS APPLY OPENJSON(categories, '$') t
LEFT JOIN Categories c ON c.Id = t.Value

Firebird 2.5 Removing Rows with Duplicate Fields

I am trying to removing duplicate values which, for some reason, was imported in a specific Table.
There is no Primary Key in this table.
There is 27797 unique records.
Select distinct txdate, plunumber from itemaudit
Give me the correct records, but only displays the txdate, plunumber of course.
If it was possible to select all the fields but only select the distinct of txdate,plunumber I could export the values, delete the duplicated ones and re-import.
Or if its possible to delete the distinct values from the entire table.
If you select the distinct of all fields the value is incorrect.

To get all information on the duplicates, you simply need to query all information for the duplicate rows using a JOIN:
SELECT b.*
FROM (SELECT COUNT(*) as cnt, txdate, plunumber
FROM itemaudit
GROUP BY txdate, plunumber
HAVING COUNT(*) > 1) a
INNER JOIN itemaudit b ON a.txdate = b.txdate AND a.plunumber = b.plunumber

DELETE FROM itemaudit t1
WHERE EXISTS (
SELECT 1 FROM itemaudit t2
WHERE t1.txdate = t2.txdate and t1.plunumber = t2.plunumber
AND t1.RDB$DB_KEY < t2.RDB$DB_KEY
);

How to get fields and added in group by in PostreSQL8.4?

I am selecting column used in group by and count, and query looks something like
SELECT s.country, count(*) AS posts_ct
FROM store s
JOIN store_post_map sp ON sp.store_id = s.id
GROUP BY 1;
However, I want to select some more fields, like store name or store address from store table where count is max, but I don't to include that in group by clause.

For instance, to get the stores with the highest post-count per country:
SELECT DISTINCT ON (s.country)
s.country, s.store_id, s.name, sp.post_ct
FROM store s
JOIN (
SELECT store_id, count(*) AS post_ct
FROM store_post_map
GROUP BY store_id
) sp ON sp.store_id = s.id
ORDER BY s.country, sp.post_ct DESC
Add any number of columns from store to the SELECT list.
Details about this query style in this related answer:
Select first row in each GROUP BY group?
Reply to comment
This produces the count per country and picks (one of) the store(s) with the highest post-count:
SELECT DISTINCT ON (s.country)
s.country, s.store_id, s.name
,sum(post_ct) OVER (PARTITION BY s.country) AS post_ct_for_country
FROM store s
JOIN (
SELECT store_id, count(*) AS post_ct
FROM store_post_map
GROUP BY store_id
) sp ON sp.store_id = s.id
ORDER BY s.country, sp.post_ct DESC;
This works because the window function sum() is applied before DISTINCT ON per definition.

TSQL Update Query behaving unexpectedly

I have a nested select query that is returning the proper amount of rows. The query builds a recordset and compares it to a table and returns the records in the query that are not in the table.
I converted the select query to an update query. I am trying to populate the table with the rows returned from the query. When I run the update query it is returning with zero rows to update. I dont understand why because the select query is returning record and I am using the same code in the update query.
Thanks
Select Query: (This is returning several records)
Select *
From
(SELECT DISTINCT
ProductClass,SalProductClass.[Description],B.Branch,B.BranchDesc,B.Salesperson,B.Name,
CAST(0 AS FLOAT) AS Rate,'N' AS Split
FROM (SELECT SalBranch.Branch,SalBranch.[Description] AS BranchDesc,A.Salesperson,A.Name
FROM (SELECT DISTINCT
Salesperson,Name
FROM SalSalesperson
) A
CROSS JOIN SalBranch
) B
CROSS JOIN SalProductClass
) C
Left Outer Join RateComm On
RateComm.ProductClass = C.ProductClass and
RateComm.Branch = C.Branch And RateComm.Salesperson = C.Salesperson
Where RateComm.ProductClass is Null
Update Query: (This is returning zero records)
UPDATE RateComm
SET RateComm.ProductClass=C.ProductClass,RateComm.ProdClassDesc=C.ProdClassDesc,
RateComm.Branch=C.Branch,RateComm.BranchDesc=C.BranchDesc,RateComm.Salesperson=C.Salesperson,
RateComm.Name=C.Name,RateComm.Rate=C.Rate,RateComm.Split=C.Split
FROM (SELECT DISTINCT
ProductClass,SalProductClass.[Description] AS ProdClassDesc,B.Branch,B.BranchDesc,B.Salesperson,B.Name,
CAST(0 AS FLOAT) AS Rate,'N' AS Split
FROM (SELECT SalBranch.Branch,SalBranch.[Description] AS BranchDesc,A.Salesperson,A.Name
FROM (SELECT DISTINCT
Salesperson,Name
FROM SalSalesperson
) A
CROSS JOIN SalBranch
) B
CROSS JOIN SalProductClass
) C
LEFT OUTER JOIN RateComm ON C.ProductClass=RateComm.ProductClass AND
C.Salesperson=RateComm.Salesperson AND C.Branch=RateComm.Branch
WHERE RateComm.ProductClass IS NULL

It's difficult to update what doesn't exist. Have you tried an INSERT query instead?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres: make repeated subqueries more efficient? - postgresql

The query could be rewritten as: SELECT t.id, t.name, COUNT(p.template_id) as num_projects, COUNT(p.template_id) FILTER(WHERE p.is_deleted) as num_deleted, COUNT(p.template_id) FILTER(WHERE p.is_listed) as num_listed FROM template t LEFT JOIN project p ON p.template_id=t.id GROUP BY t.id, t.name

Related

Add Column in table with value partition by group

How to use OPENJSON on multiple rows

Firebird 2.5 Removing Rows with Duplicate Fields

How to get fields and added in group by in PostreSQL8.4?

TSQL Update Query behaving unexpectedly

Categories

Resources