query optimization how to reduce planning time and execution time in postgresql - postgresql-9.4

query optimization how to reduce planning time and execution time in postgresql.
SQL Query :-
select s.id,state_name state, d.id no_districts,b.id no_blocks,v.id no_villages,district_name district,block_tehsil_name block,village_name village ,corr_vs,non_corr_vs,corr_gw,non_corr_gw,corr_sw,non_corr_sw
from ref_state s left join ref_district d on d.ref_state_id=s.id
left join ref_block_tehsil b on d.id=b.ref_district_id
left join ref_village v on v.ref_block_tehsil_id=b.id
left join (select ref_village_id ,coalesce(sum(tot_vs),0)-coalesce(sum(non_corr_vs),0) corr_vs,sum(non_corr_vs)non_corr_vs
from (select ref_village_id,count(vs.id)tot_vs,(select count(id) from mi_census_village_schedule_validation vsv where vsv.ref_village_id=vs.ref_village_id)non_corr_vs from mi_census_village_schedule vs group by ref_village_id )vs1 group by ref_village_id ) vs on vs.ref_village_id=v.id
left join(select ref_village_id ,coalesce(sum(tot_gw),0)- coalesce(sum(non_corr_gw),0) corr_gw,sum(non_corr_gw)non_corr_gw from (select ref_village_id,count(gws.id)tot_gw,(select count(id) from mi_census_ground_water_scheme_validation gwsv where gwsv.ref_village_id=gws.ref_village_id)non_corr_gw from mi_census_ground_water_scheme gws group by ref_village_id )gws1 group by ref_village_id ) gws on gws.ref_village_id=v.id
left join (select ref_village_id,coalesce(sum(tot_sw),0)- coalesce(sum(non_corr_sw),0) corr_sw,sum(non_corr_sw)non_corr_sw from(select ref_village_id,count(sws.id)tot_sw,(select count(id) from mi_census_surface_water_scheme_validation swsv where swsv.ref_village_id=sws.ref_village_id)non_corr_sw from mi_census_surface_water_scheme sws group by ref_village_id )sws1 group by ref_village_id ) sws on sws.ref_village_id=v.id
where s.id=30and d.id=21 and b.id=127 and v.id=632

there are several techniques that can help improve the performance of SQL queries under workspaces. Follow the SQL best practices to ensure query optimization like
proper indexes,so that SQL queries can cause minimal table scans
Avoid using functions in predicates.
Avoid using wildcard (%) at the beginning of a predicate.
Avoid unnecessary columns in SELECT clause.
Use inner join, instead of outer join if possible. -- You have used multiple left join in your above query which also impact your query.
DISTINCT and UNION should be used only if it is necessary.
use of order by clause when sorted result set is required.Be aware of the performance impact of adding the ORDER BY clause, as the database needs to sort the result set, resulting in one of the most expensive operations in SQL execution

Related

Using LIMIT Statement in INNER JOIN (postgreSQL)

I am having trouble using the LIMIT Statement. I would really appreciate your help.
I am trying to INNER JOIN three tables and use the LIMIT statement to only query a few lines because the tables are so huge.
So, basically, this is what I am trying to accomplish:
SELECT *
FROM ((scheme1.table1
INNER JOIN scheme1.table2
ON scheme1.table1.column1 = scheme1.table2.column1 LIMIT 1)
INNER JOIN scheme1.table3
ON scheme1.table1.column1 = scheme1.table3.column1)
LIMIT 1;
I get an syntax error on the LIMIT from the first INNER JOIN. Why? How can I limit the results I get from each of the INNER JOINS. If I only use the second "LIMIT 1" at the bottom, I will query the entire table.
Thanks a lot!
LIMIT can only be applied to queries, not to a table reference. So you need to use a complete SELECT query for table2 in order to be able to use the LIMIT clause:
SELECT *
FROM schema1.table1 as t1
INNER JOIN (
select *
from schema1.table2
order by ???
limit 1
) as t2 ON t1.column1 = t2.column1
INNER JOIN schema1.table3 as t3 on ON t1.column1 = t3.column1
order by ???
limit 1;
Note that LIMIT without an ORDER BY typically makes no sense as results of a query have no inherent sort order. You should think about applying the necessary ORDER BY in the derived table (aka sub-query) and the outer query to get consistent and deterministic results.

Should I do ORDER BY twice when selecting from subquery?

I have SQL query (code below) which selects some rows from subquery. In subquery I perform ORDER BY.
The question is: will order of subquery be preserved in parent query?
Is there some spec/document or something which proves that?
SELECT sub.id, sub.name, ot.field
FROM (SELECT t.id, t.name
FROM table t
WHERE t.something > 10
ORDER BY t.id
LIMIT 25
) sub
LEFT JOIN other_table ot ON ot.table_id = sub.id
/**order by id?**/```
will order of subquery be preserved in parent query
It might happen, but you can not rely on that.
For example, if the optimizer decides to use a hash join between your derived table and other_table then the order of the derived table will not be preserved.
If you want a guaranteed sort order, then you have to use an order by in the outer query as well.

Write a sql query not using correlated query

I am trying to rewrite a sql query not using correlated query.
This is the query:
SELECT DISTINCT v.vendor_name, i.invoice_number, i.invoice_date, i.invoice_total
FROM vendors v JOIN invoices i
ON i.vendor_id = v.vendor_id
AND i.invoice_date = (SELECT MIN(invoice_date)
FROM invoices
WHERE vendor_id = v.vendor_id)
I tried many ways, but I am always getting stuck with this query:
I don't know how to integrate columns invoice_number and invoice_total in this resultset.
SELECT vendor_name, MIN(invoice_date)
FROM vendors JOIN invoices USING (vendor_id)
GROUP BY vendor_name
Can anyone help me, please?
One approach would be to use the analytic function rank
SELECT DISTINCT vendor_name, invoice_number, invoice_date, invoice_total
FROM (SELECT v.vendor_name,
i.invoice_number,
i.invoice_date,
i.invoice_total,
rank() over (partition by v.vendor_id
order by i.invoice_date asc) rnk
FROM vendors v
JOIN invoices i
ON i.vendor_id = v.vendor_id)
WHERE rnk = 1
If you are looking to improve the performance of your query, I'd strongly question whether you need the DISTINCT since that forces an extra sort. Frequently, developers use DISTINCT when they're really missing some join condition to properly eliminate the duplicate rows.

How does COUNT(*) behave in an inner join

Take this query:
SELECT c.CustomerID, c.AccountNumber, COUNT(*) AS CountOfOrders,
SUM(s.TotalDue) AS SumOfTotalDue
FROM Sales.Customer AS c
INNER JOIN Sales.SalesOrderheader AS s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;
I expected COUNT(*) to count the rows in Sales.Customer but to my surprise it counts the number of rows in the joined table.
Any idea why this is? Also, is there a way to be explicit in specifying which table COUNT() should operate on?
Query Processing Order...
The FROM clause is processed before the SELECT clause -- which is to say -- by the time SELECT comes into play, there is only one (virtual) table it is selecting from -- namely, the individual tables after their joined (JOIN), filtered (WHERE), etc.
If you just want to count over the one table, then you might try a couple of things...
COUNT(DISTINCT table1.id)
Or turn the table you want to count into a sub-query with count() inside of it

Nested select statement in FROM clause? Inner Join statements? or just table name?

I'm building a query that needs data from 5 tables.
I've been told by a DBA in the past that specifying a list of columns vs getting all columns (*) is preferred from some performance/memory aspect.
I've also been told that the database performs a JOIN operation behind the scenes when there's a list of tables in the FROM clause, to create one table (or view).
The existing database has very little data at the moment, as we're at a very initial point. So not sure I can measure the performance hit in practice.
I am not a database pro. I can get what data I need. The dillema is, at what price.
Added: At the moment I'm working with MS SQL Server 2008 R2.
My questions are:
Is there a performance difference and why, between the following:
a. SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel that this might be a performance hit)
b. SELECT ... FROM tbl1 inner join tbl2 on ... inner join tbl3 on ... etc (would this be more explicit to the server and save on performance/memory)?
c. SELECT ... FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this save anythig? or is it just extra select statements that create more work for the server and for us)?
Is there yet a better way to do this?
Below are two queries that both get the slice of data that I need. One includes more nested select statements.
I apologize if they are not written in a standard form or helplessly overcomplicated - hopefully you can decipher. I try to keep them organized as much as possible.
Insights would be most appreciated as well.
Thanks for checking this out.
5 tables: devicepool, users, trips, TripTracker, and order
Query 1 (more select statements):
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
(
SELECT
tripid,
[status] tripstatus,
stops,
devid,
groupid
FROM
trips
)
AS [base2]
ON base.devid = base2.devid AND base2.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
(
SELECT
[status] [orderstatus],
[address] [destaddress],
[tripid],
stopnumber orderstopnumber
FROM [order]
)
AS [orders]
ON orders.orderstopnumber = tracker.stopnumber)
Query 2:
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
trips
ON base.devid = trips.devid AND trips.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
[order]
ON [order].stopnumber = tracker.stopnumber)
Is there a performance difference and why, between the following: a.
SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel
that this might be a performance hit) b. SELECT ... FROM tbl1 inner
join tbl2 on ... inner join tbl3 on ... etc (would this be more
explicit to the server and save on performance/memory)? c. SELECT ...
FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this
save anythig? or is it just extra select statements that create more
work for the server and for us)?
a) and b) should result in the same query plan (although this is db-specific). b) is much preferred for portability and readability over a). c) is a horrible idea, that hurts readability and if anything will result in worse peformance. Let us never speak of it again.
Is there yet a better way to do this?
b) is the standard approach. In general, writing the plainest ANSI SQL will result in the best performance, as it allows the query parser to easily understand what you are trying to do. Trying to outsmart the compiler with tricks may work in a given situation, but does not mean that it will still work when the cardinality or amount of data changes, or the database engine is upgraded. So, avoid doing that unless you are absolutely forced to.