selecting from a view is taking longer than 30+ minutes - tsql

I am working on making this view fast enough to fetch the result set in reasonable time which is at the moment taking more than 30+ minutes, going parallel and causing all sorts of pain with increased cpu time. I have identified the problem query but I can't figure out a way to cut the execution time by either re-writing the query or adding appropriate index if needed. We already have clustered index on client_id and non clustered index on the hash_key column in both the tables. Also these respective join tables have close to around 238 million records from work_orders and a total of 287011570 records from s_inspections table.
select
wo.client_id,
wo.work_orders_hash_key,
wo.work_order_number,
wo.work_order_id,
si.inspection_id,
si.inspection_name,
si.inspection_detail,
si.master_inspection_id,
si.master_inspection_detail,
si.status_id,
si.exception,
si.inspection_order,
si.comment,
si.[procedure_id],
si.[flag_id],
si.[asset_id],
si.[asset_name],
si.[inspection_status],
si.[is_removed],
si.[response],
row_number() over(partition by si.work_orders_hash_key, si.inspection_id order by si.dss_version desc) rnk
from
datavault.dbo.h_work_orders wo with (readuncommitted)
join datavault.dbo.s_inspections si with (readuncommitted) on wo.client_id = si.client_id and wo.work_orders_hash_key = si.work_orders_hash_key
where
wo.client_id in (7700876368663, 8800387996408)
Below is the estimated execution plan as it was taking quite sometime so I couldn't provide the actual execution plan.
https://www.brentozar.com/pastetheplan/?id=ryLzvNwUN
Any help would be greatly appreciated.

Your compute scalar is 59% of your query cost.
I would guess it's this line:
row_number() over(partition by si.work_orders_hash_key, si.inspection_id order by si.dss_version desc) rnk
It's estimating 159014000000000 rows!
Whack this line (lot of work to return a row number) and run it again.

maybe this will work to keep you in business since the row_number() was the issue. try:
;with x as (
select
wo.client_id,
wo.work_orders_hash_key,
wo.work_order_number,
wo.work_order_id,
si.inspection_id,
si.inspection_name,
si.inspection_detail,
si.master_inspection_id,
si.master_inspection_detail,
si.status_id,
si.exception,
si.inspection_order,
si.comment,
si.[procedure_id],
si.[flag_id],
si.[asset_id],
si.[asset_name],
si.[inspection_status],
si.[is_removed],
si.[response],
si.dss_version
from
datavault.dbo.h_work_orders wo with (readuncommitted)
join datavault.dbo.s_inspections si with (readuncommitted) on wo.client_id = si.client_id and wo.work_orders_hash_key = si.work_orders_hash_key
where
wo.client_id in (7700876368663, 8800387996408)
)
select
x.client_id,
x.work_orders_hash_key,
x.work_order_number,
x.work_order_id,
x.inspection_id,
x.inspection_name,
x.inspection_detail,
x.master_inspection_id,
x.master_inspection_detail,
x.status_id,
x.exception,
x.inspection_order,
x.comment,
x.[procedure_id],
x.[flag_id],
x.[asset_id],
x.[asset_name],
x.[inspection_status],
x.[is_removed],
x.[response],
row_number() over(partition by x.work_orders_hash_key, x.inspection_id order by x.dss_version desc) rnk
from x;

Related

What is index do I need create?

I have query which sometimes really slow, how can I speed it up?
SELECT PRODUCTS.ID,
SPECIALPRODUCTGROUPS."id" AS "isProductGroup",
PRODUCTS."OEM",
PRODUCTS.NAME,
MAIN."stockBalance" AS STOCKBALANCE,
PRODUCTS."minShippingRate",
PRODUCTS."externalId",
ARTICLE,
"categoryId",
BRAND,
PRICES."price" AS "price"
FROM PUBLIC."Products" AS PRODUCTS
INNER JOIN PUBLIC."Prices" AS PRICES ON PRODUCTS.ID = PRICES."productId"
AND PRICES."accountId" = 13576
AND PRICES."price" >= 0
AND PRICES."price" <= 337802
INNER JOIN PUBLIC."RegionalWarehouseStockBalances" AS MAIN ON PRODUCTS.ID = MAIN."productId"
AND MAIN."warehouseId" = 1
AND MAIN."stockBalance" > 0
LEFT JOIN PUBLIC."SpecialProductGroups" AS SPECIALPRODUCTGROUPS ON PRODUCTS."productGroupId" = SPECIALPRODUCTGROUPS."productGroupId"
AND SPECIALPRODUCTGROUPS."accountId" = 13576
AND NOW() < SPECIALPRODUCTGROUPS."finishedAt"
WHERE PRODUCTS."active" = TRUE
ORDER BY BRAND ASC
LIMIT 50
There is explain of this query
Explain
I can't add explain in text because stackoverflow complains about the amount of code
Added explain https://explain.depesz.com/s/4UAg
I tried to create indexes on RegionalWarehouseStockBalances, but all my variants doesn't help me
I am using PostgreSQL 12
You need to run
VACUUM prices;
so that the index-only scan has few "heap fetches". That will make all the difference.
Reduce autovacuum_vacuum_scale_factor for that table so that the system vacuums the table frequently.

Extremely slow planning for query with lot of joins in PostgreSQL

(Postgres v13)
I've got a query which takes 2 - 5 seconds to plan. The query joins my languages table and translations table to get translation results for multiple languages. When I add even more languages/translations to load the execution time is exponentially growing.
select
key0_.id as col_0_0_,
key0_.name as col_1_0_,
(select
count(screenshot60_.id)
from
screenshot screenshot60_
inner join
key key61_
on screenshot60_.key_id=key61_.id
where
key0_.id=key61_.id) as col_2_0_,
languages2_.tag as col_3_0_,
translatio31_.id as col_4_0_,
translatio31_.text as col_5_0_,
translatio31_.state as col_6_0_,
translatio31_.auto as col_7_0_,
translatio31_.mt_provider as col_8_0_,
languages3_.tag as col_11_0_,
translatio32_.id as col_12_0_,
translatio32_.text as col_13_0_,
translatio32_.state as col_14_0_,
translatio32_.auto as col_15_0_,
translatio32_.mt_provider as col_16_0_,
... the same over and over many times ...
languages30_.tag as col_227_0_,
translatio59_.id as col_228_0_,
translatio59_.text as col_229_0_,
translatio59_.state as col_230_0_,
translatio59_.auto as col_231_0_,
translatio59_.mt_provider as col_232_0_,
0 as col_233_0_,
0 as col_234_0_
from
key key0_
inner join
project project1_
on key0_.project_id=project1_.id
inner join
language languages2_
on project1_.id=languages2_.project_id
and (
languages2_.tag='en-US'
)
inner join
language languages3_
on project1_.id=languages3_.project_id
and (
languages3_.tag='es-PE'
)
... many times the same ...
inner join
language languages30_
on project1_.id=languages30_.project_id
and (
languages30_.tag='es-MX'
)
left outer join
translation translatio31_
on key0_.id=translatio31_.key_id
and (
translatio31_.language_id=languages2_.id
)
... many times the same ...
left outer join
translation translatio59_
on key0_.id=translatio59_.key_id
and (
translatio59_.language_id=languages30_.id
)
where
(
key0_.name in (
'base_administrative_notes.desc'
)
)
and
key0_.project_id=836
group by
key0_.id ,
languages2_.tag ,
translatio31_.id ,
languages3_.tag ,
translatio32_.id ,
... many times the same ...
languages30_.tag ,
translatio59_.id
order by
key0_.name asc nulls first,
key0_.id asc nulls first limit 1
The visualised EXPLAIN ANALYSE result: https://explain.dalibo.com/plan/uWS (the full query can be found there as well as raw output from explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)).
I found in other threads that this can be caused by using too many indexes on the tables, but I only have a unique index on my translations table on key_id and language_id columns.
EDIT:
I've found out that setting join_collapse_limit to some value between 1 to 5 reduces the planning to under 200ms. Don't know if this is the best solution, but I am going to use it as a workaround for now.
As Laurenz Albe explained, the planner is probably trying to reorder the joins to optimize the query.
With n tables, the number of possible joins order is n! (factorial n).
My suggestion is to :
make sure the order is the best in your query
set that particular parameter to 1 before the query
play the query
reset the parameter
You can check Alicja's slide deck (from slide 22) where she illustrates that particular problem with examples here: https://www.postgresql.eu/events/pgconfeu2017/sessions/session/1617/slides/9/FromMinutesToMilliseconds.pdf

Windowed Sum not displaying data correctly

I have a weird question for you. I have value coming from a sub query to which I am applying a Windowed Function in order to get a running total however, where the value is repeated (legitimately) the individual sums are getting rolled up into one. I will paste my redacted code and results below
SELECT
([SUB QUERY].[Field_A]/[SUB QUERY].[Field_B])*100 [Value],
SUM([SUB QUERY].[Field_A]/[SUB QUERY].[Field_B])*100 OVER (ORDER BY [SUB QUERY].[Field_A] DESC) RunningTotal
FROM
(
[SUB QUERY]
) Dat
The results come out as shown below.
Value RunningTotal
17.50501775 17.51
15.7074377 48.92
15.7074377 48.92
10.12725342 59.05
8.098755369 67.15
7.450983484 74.6
6.886517246 81.48
6.842160695 88.33
6.839469823 95.17
4.83496681 100
As you can see, the 2nd and 3rd lines both have a value of 15.7074377 but they are being added to the running total as a single value of 31.4148754. The running total for line 2 should say 33.21 and the 4th is correct.
Any idea whats happening here?
Thanks in advance
It's a bit of a guess based on your info, but I think the problem here is that you actually need the sum of the sum. You could use a CTE to solve this, or just try this:
SELECT
([SUB QUERY].[Field_A]/[SUB QUERY].[Field_B])*100 [Value],
SUM (SUM([SUB QUERY].[Field_A]/[SUB QUERY].[Field_B])*100 OVER (ORDER BY [SUB QUERY].[Field_A] DESC)) RunningTotal
FROM ([SUB QUERY]) AS Dat
Unfortunately I can't show the data as it is all very sensitive and I have been instructed not to.
The good news though is I found the answer (here), as I was summing the same column I was using to ORDER BY in the windowed function, it will roll up all consecutive values if they are identical.
This will demonstrate the point though if you want to see it
DECLARE #Staging TABLE (Subtotal INT)
INSERT INTO
#Staging (SubTotal)
VALUES
(1),(2),(3),(3),(5),(6),(7),(8),(9),(10)
SELECT
Subtotal,
SUM(SubTotal) OVER (ORDER BY SubTotal) RunningTotal
FROM
#Staging
Notice that the repeated 3 suffers the same issue I described above. By adding ROW_NUMBER() OVER (ORDER BY Field_A DESC) to the sub query I was able to sort by the new ID and it worked like a charm.

Is there a more efficient / elegant way to write this code I have?

I'm wondering if anybody can help me out with any or all of this code below. I've made it work, but it seems inefficient to me and is probably quite a bit slower than optimal.
Some basic background on the necessity of this code in the first place:
I have a table of shipping records that does not include the corresponding invoice number. I've looked all through the tables and I continue to do so. In fact, only this morning I discovered that if a packing slip has been generated that I can link the shipping table to the packing slip table via that packing slip ID and grab the invoice number from there. Absent that link, however, I'm forced to guess. In most instances, that's not terribly difficult, because the invoice table has number, line and release that can match up. But when there are multiple shipments for number, line and release (for instance, when a line is partially shipped) then there can be multiple answers, only one of which is correct. I am partially helped by the presence of a a column in the shipping table that states what the date sequence is for that number, line and release, but there are still circumstances where the process I use for "guessing" can be somewhat ambiguous.
What my procedure does is this. First, it creates a table of data that includes the invoice number if there was a pack slip to link it through.
Next, it dumps all of that data into a second table, this time using--only if the invoice was NULL in the first table--a "guess" about the invoice number based on partitioning all the shipping records by number, line, release, date sequence and date, and then comparing that to the same type of thing for the invoice table, and trying to line everything up by date.
Finally, it parses through that table and finds any last nulls and essentially matches them up with the first record of any invoice for that number, line and release.
Both guesses have added characters to show that they are, in fact, guesses.
IF OBJECT_ID('tempdb..#cosTAble') IS NOT NULL
DROP TABLE #cosTable
DECLARE #cosTable2 TABLE (
ID INT IDENTITY
,co_num CoNumType
,co_line CoLineType
,co_release CoReleaseType
,date_seq DateSeqType
,ship_date DateType
,inv_num NVARCHAR(14)
)
DECLARE
#co_num_ck CoNumType
,#co_line_ck CoLineType
,#co_release_ck CoReleaseType
DECLARE #Counter1 INT = 0
SELECT cos.co_num, cos.co_line, cos.co_release, cos.date_seq, cos.ship_date, cos.qty_invoiced, pck.inv_num
INTO #cosTable
FROM co_ship cos
LEFT JOIN pckitem pck
ON cos.pack_num = pck.pack_num
AND cos.co_num = pck.co_num
AND cos.co_line = pck.co_line
AND cos.co_release = pck.co_release
;WITH cos_Order
AS(
SELECT co_num, co_line, co_release, qty_invoiced, date_seq, ship_date, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY ship_date) AS cosrow
FROM co_ship
WHERE qty_invoiced > 0
),
invi_Order
AS(
SELECT inv_num, co_num, co_line, co_release, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY RecordDate) AS invirow
FROM inv_item
WHERE qty_invoiced > 0
),
cos_invi
AS(
SELECT cosO.*, inviO.inv_num
FROM cos_Order cosO
LEFT JOIN invi_Order inviO
ON cosO.co_num = inviO.co_num AND cosO.co_line = inviO.co_line AND cosO.cosrow = inviO.invirow)
INSERT INTO #cosTable2
SELECT cosT.co_num, cosT.co_line, cosT.co_release, cosT.date_seq, cosT.ship_date, COALESCE(cosT.inv_num,'*'+cosi.inv_num) AS inv_num
FROM #cosTable cosT
LEFT JOIN cos_invi cosi
ON cosT.co_num = cosi.co_num
AND cosT.co_line = cosi.co_line
AND cosT.co_release = cosi.co_release
AND cosT.date_seq = cosi.date_seq
AND cosT.ship_date = cosi.ship_date
WHILE #Counter1 < (SELECT MAX(ID) FROM #cosTable2) BEGIN
SET #Counter1 += 1
SET #co_num_ck = (SELECT co_num FROM #cosTable2 WHERE ID = #Counter1)
SET #co_line_ck = (SELECT co_line FROM #cosTable2 WHERE ID = #Counter1)
SET #co_release_ck = (SELECT co_release FROM #cosTable2 WHERE ID = #Counter1)
IF EXISTS (SELECT * FROM #cosTable2 WHERE ID = #Counter1 AND inv_num IS NULL)
UPDATE #cosTable2
SET inv_num = '^' + (SELECT TOP 1 inv_num FROM #cosTable2 WHERE
#co_num_ck = co_num AND
#co_line_ck = co_line AND
#co_release_ck = co_release)
WHERE ID = #Counter1 AND inv_num IS NULL
END
SELECT * FROM #cosTable2
ORDER BY co_num, co_line, co_release, date_seq, ship_date
You're in a bad spot - as #craig.white and #HLGEM suggest, you've inherited something without sufficient constraints to make the data correct or safe...and now you have to "synthesize" it. I get that guesses are the best you can do, and you can, at least make your guesses reasonable performance-wise.
After that, you should squeal loudly to get some time to fix the db - to apply the constraints needed to prevent further crapification of the data.
Performance-wise, the while loop is a disaster. You'd be better off replacing that whole mess with a single update statement...something like:
update c0
set inv_nbr = '^' + c1.inv_nbr
from
#cosTable2 c0
left outer join
(
select
co_num,
co_line,
co_release,
inv_nbr
from
#cosTable2
where
inv_nbr is not null
group by
co_num,
co_line,
co_release,
inv_nbr
) as c1
on
c0.co_num = c1.co_num and
c0.co_line = c1.co_line and
c0.co_release = c1.co_release
where
c0.inv_num is null
...which does the same thing the loop does, only in a single statement.
It seems to me that you are trying very hard to solve a problem that should not exist. What you describe is an unfortunately common situation where a process has grown organically without intent and specific direction as a business has grown which has made data extraction near impossible to automate. You very much need a set of policies and procedures- For (very crude and simple) example:
1: An Order must exist before a packing slip can be generated.
2: a packing slip must exist before an invoice can be generated.
3: an invoice is created using data from the packing slip and order (what was requested, what was picked, what do we bill)
-Again, this is a crude example just to illustrate the idea.
All of the data MUST be entered at the proper time or someone has not done their job.
It is not in the IT departments typical skillset to accurately and consistently provide management good data when such data does not exist.

SQLITE : Optimize ORDER BY Query

All,
I am iOS developer. Currently we have stored 2.5 lacks data in database. And we have implemented search functionality on that. Below is the query that we are using.
select CustomerMaster.CustomerName ,CustomerMaster.CustomerNumber,
CallActivityList.CallActivityID,CallActivityList.CustomerID,CallActivityList.UserID,
CallActivityList.ActivityType,CallActivityList.Objective,CallActivityList.Result,
CallActivityList.Comments,CallActivityList.CreatedDate,CallActivityList.UpdateDate,
CallActivityList.CallDate,CallActivityList.OrderID,CallActivityList.SalesPerson,
CallActivityList.GratisProduct,CallActivityList.CallActivityDeviceID,
CallActivityList.IsExported,CallActivityList.isDeleted,CallActivityList.TerritoryID,
CallActivityList.TerritoryName,CallActivityList.Hours,UserMaster.UserName,
(FirstName ||' '||LastName) as UserNameFull,UserMaster.TerritoryID as UserTerritory
from
CallActivityList
inner join CustomerMaster
ON CustomerMaster.DeviceCustomerID = CallActivityList.CustomerID
inner Join UserMaster
On UserMaster.UserID = CallActivityList.UserID
where
(CustomerMaster.CustomerName like '%T%' or
CustomerMaster.CustomerNumber like '%T%' or
CallActivityList.ActivityType like '%T%' or
CallActivityList.TerritoryName like '%T%' or
CallActivityList.SalesPerson like '%T%' )
and CallActivityList.IsExported!='2' and CallActivityList.isDeleted != '1'
order by
CustomerMaster.CustomerName
limit 50 offset 0
Without using 'order by' The query is returning result in 0.5 second. But when i am attaching 'order by', Time is increasing to 2 seconds.
I have tried indexing but it is not making any noticeable change. Any one please help. If we are not going through Query then how can we do it fast.
Thanks in advance.
This is due to the the limit. Without ORDER BY only 50 records have to be processed and any 50 will be returned. With ORDER BY all the records have to be processed in order to determine which ones are the first 50 (in order).
The problem is that the ORDER BY is performed on a joined table. Otherise you could apply the limit on the main table (I assume it is the CallActivityList) first and then join.
SELECT ...
FROM
(SELECT ... FROM CallActivityList ORDER BY ... LIMIT 50 OFFSET 0) AS CAL
INNER JOIN CustomerMaster ON ...
INNER JOIN UserMaster ON ...
ORDER BY ...
This would reduce the costs for joining the tables. If this is not possible, try at least to join CallActivityList with CustomerMaster. Apply the limit to those and finally join with UserMaster.
SELECT ...
FROM
(SELECT ...
FROM
CallActivityList
INNER JOIN CustomerMaster ON ...
ORDER BY CustomerMaster.CustomerName
LIMIT 50 OFFSET 0) AS ActCust
INNER JOIN UserMaster ON ...
ORDER BY ...
Also, in order to make the ordering unambiguous, I would include more columns into the order by, like call date and call id. Otherwise this could result in a inconsistent paging.