Windowed Sum not displaying data correctly - tsql

I have a weird question for you. I have value coming from a sub query to which I am applying a Windowed Function in order to get a running total however, where the value is repeated (legitimately) the individual sums are getting rolled up into one. I will paste my redacted code and results below
SELECT
([SUB QUERY].[Field_A]/[SUB QUERY].[Field_B])*100 [Value],
SUM([SUB QUERY].[Field_A]/[SUB QUERY].[Field_B])*100 OVER (ORDER BY [SUB QUERY].[Field_A] DESC) RunningTotal
FROM
(
[SUB QUERY]
) Dat
The results come out as shown below.
Value RunningTotal
17.50501775 17.51
15.7074377 48.92
15.7074377 48.92
10.12725342 59.05
8.098755369 67.15
7.450983484 74.6
6.886517246 81.48
6.842160695 88.33
6.839469823 95.17
4.83496681 100
As you can see, the 2nd and 3rd lines both have a value of 15.7074377 but they are being added to the running total as a single value of 31.4148754. The running total for line 2 should say 33.21 and the 4th is correct.
Any idea whats happening here?
Thanks in advance

It's a bit of a guess based on your info, but I think the problem here is that you actually need the sum of the sum. You could use a CTE to solve this, or just try this:
SELECT
([SUB QUERY].[Field_A]/[SUB QUERY].[Field_B])*100 [Value],
SUM (SUM([SUB QUERY].[Field_A]/[SUB QUERY].[Field_B])*100 OVER (ORDER BY [SUB QUERY].[Field_A] DESC)) RunningTotal
FROM ([SUB QUERY]) AS Dat

Unfortunately I can't show the data as it is all very sensitive and I have been instructed not to.
The good news though is I found the answer (here), as I was summing the same column I was using to ORDER BY in the windowed function, it will roll up all consecutive values if they are identical.
This will demonstrate the point though if you want to see it
DECLARE #Staging TABLE (Subtotal INT)
INSERT INTO
#Staging (SubTotal)
VALUES
(1),(2),(3),(3),(5),(6),(7),(8),(9),(10)
SELECT
Subtotal,
SUM(SubTotal) OVER (ORDER BY SubTotal) RunningTotal
FROM
#Staging
Notice that the repeated 3 suffers the same issue I described above. By adding ROW_NUMBER() OVER (ORDER BY Field_A DESC) to the sub query I was able to sort by the new ID and it worked like a charm.

Related

Optimize KDB query time to get rolling average price from each contributor

Each time a contributor gives an updated price I want to use this quote along with the latest prices of other quotes to calculate the total average at that moment.
t:`time xasc flip (`userID`time`price)!(`quote1`quote2`quote3`quote3`quote3`quote3`quote4`quote2`quote4`quote3`quote2`quote3`quote1`quote3`quote4`quote1`quote4`quote2`quote2`quote4;(21:11:37 03:13:29 15:35:39 09:59:13 04:34:15 13:09:01 21:21:55 16:54:39 04:03:04 18:22:39 17:05:44 05:08:40 07:35:50 15:46:15 17:32:29 19:42:47 03:28:48 04:20:03 14:16:55 09:02:12);86.4 84.4 54.26 7.76 63.75 97.61 53.97 71.63 38.86 52.23 87.25 65.69 96.25 37.15 17.45 58.97 95.51 61.59 70.25 35.5)
Desired output below
delete userIDPriceList,userIDComps from t,'raze {[idx;tab] select avgPrice:avg price, userIDPriceList:price,userIDComps:userID from select last price by userID from t where i <= idx}[;t] each til count t
userIDPriceList,userIDComps columns are not required in final output
Performance is slow and looking for better way to calculate.
q) \t do[200000;delete userIDPriceList,userIdComps from t,'raze {[idx;tab] select avgPrice:avg price, userIDPriceList:price,userIDComps:userID from select last price by userID from t where i <= idx}[;t] each til count t]
10152j
Thanks in advance
Based on your clarified requirements, another approach is to accumulate using scan:
update avgPrice:avg each{x,(1#y)!1#z}\[();userID;price] from t
Igors solution is faster if the data is static (aka you can prep the table with the attribute once).
Below code gives average of all previous prices for given userID including current row:
ungroup 0!select time, price, avgPrice: avgs price by userID from t
Just ensure that t is appropriately sorted by time before getting averages.
According to your comment to one of the answers, you're "trying to take the average prices of each userID as of the time of the record while ignoring any future records."
This query will do exactly that:
select userID,time,price,avgPrice:(avgs;price)fby userID from t
A query of yours (delete userIDPriceList ...) results in something different as #Anton Dovzhenko pointed out in his comment to your original question.
Update
After reading your comment I think I understood your requirement. Probably you could do this.
prices:exec `s#time!price by userID from t;
update avgPrice:avg each flip prices[;time] from t

How to keep one record in specific column and make other record value 0 in group by clause in PostgreSQL?

I have a set of data like this
The Result should look Like this
My Query
SELECT max(pi.pi_serial) AS proforma_invoice_id,
max(mo.manufacturing_order_master_id) AS manufacturing_order_master_id,
max(pi.amount_in_local_currency) AS sales_value,
FROM proforma_invoice pi
JOIN schema_order_map som ON pi.pi_serial = som.pi_id
LEFT JOIN manufacturing_order_master mo ON som.mo_id = mo.manufacturing_order_master_id
WHERE to_date(pi.proforma_invoice_date, 'DD/MM/YYYY') BETWEEN to_date('01/03/2021', 'DD/MM/YYYY') AND to_date('19/04/2021', 'DD/MM/YYYY')
AND pi.pi_serial in (9221,
9299)
GROUP BY mo.manufacturing_order_master_id,
pi.pi_serial
ORDER BY pi.pi_serial
Option 1: Create a "Running Total" field in Crystal Reports to sum up only one "sales_value" per "proforma_invoice_id".
Option 2: Add a helper column to your Postgresql query like so:
case
when row_number()
over (partition by proforma_invoice_id
order by manufacturing_order_master_id)
= 1
then sales_value
else 0
end
as sales_value
I prepared this SQLFiddle with an example for you (and would of course like to encourage you to do the same for your next db query related question on SO, too :-)

selecting from a view is taking longer than 30+ minutes

I am working on making this view fast enough to fetch the result set in reasonable time which is at the moment taking more than 30+ minutes, going parallel and causing all sorts of pain with increased cpu time. I have identified the problem query but I can't figure out a way to cut the execution time by either re-writing the query or adding appropriate index if needed. We already have clustered index on client_id and non clustered index on the hash_key column in both the tables. Also these respective join tables have close to around 238 million records from work_orders and a total of 287011570 records from s_inspections table.
select
wo.client_id,
wo.work_orders_hash_key,
wo.work_order_number,
wo.work_order_id,
si.inspection_id,
si.inspection_name,
si.inspection_detail,
si.master_inspection_id,
si.master_inspection_detail,
si.status_id,
si.exception,
si.inspection_order,
si.comment,
si.[procedure_id],
si.[flag_id],
si.[asset_id],
si.[asset_name],
si.[inspection_status],
si.[is_removed],
si.[response],
row_number() over(partition by si.work_orders_hash_key, si.inspection_id order by si.dss_version desc) rnk
from
datavault.dbo.h_work_orders wo with (readuncommitted)
join datavault.dbo.s_inspections si with (readuncommitted) on wo.client_id = si.client_id and wo.work_orders_hash_key = si.work_orders_hash_key
where
wo.client_id in (7700876368663, 8800387996408)
Below is the estimated execution plan as it was taking quite sometime so I couldn't provide the actual execution plan.
https://www.brentozar.com/pastetheplan/?id=ryLzvNwUN
Any help would be greatly appreciated.
Your compute scalar is 59% of your query cost.
I would guess it's this line:
row_number() over(partition by si.work_orders_hash_key, si.inspection_id order by si.dss_version desc) rnk
It's estimating 159014000000000 rows!
Whack this line (lot of work to return a row number) and run it again.
maybe this will work to keep you in business since the row_number() was the issue. try:
;with x as (
select
wo.client_id,
wo.work_orders_hash_key,
wo.work_order_number,
wo.work_order_id,
si.inspection_id,
si.inspection_name,
si.inspection_detail,
si.master_inspection_id,
si.master_inspection_detail,
si.status_id,
si.exception,
si.inspection_order,
si.comment,
si.[procedure_id],
si.[flag_id],
si.[asset_id],
si.[asset_name],
si.[inspection_status],
si.[is_removed],
si.[response],
si.dss_version
from
datavault.dbo.h_work_orders wo with (readuncommitted)
join datavault.dbo.s_inspections si with (readuncommitted) on wo.client_id = si.client_id and wo.work_orders_hash_key = si.work_orders_hash_key
where
wo.client_id in (7700876368663, 8800387996408)
)
select
x.client_id,
x.work_orders_hash_key,
x.work_order_number,
x.work_order_id,
x.inspection_id,
x.inspection_name,
x.inspection_detail,
x.master_inspection_id,
x.master_inspection_detail,
x.status_id,
x.exception,
x.inspection_order,
x.comment,
x.[procedure_id],
x.[flag_id],
x.[asset_id],
x.[asset_name],
x.[inspection_status],
x.[is_removed],
x.[response],
row_number() over(partition by x.work_orders_hash_key, x.inspection_id order by x.dss_version desc) rnk
from x;

Postgres query for report

I'm trying to solve this problem:
I have a query/view that will join ~10 tables to extract some fields for a report (if any). The query doesn't use any grouping function, only joins and cut off some unuseful data.
I have to take this one big view, get the group for the first index, take the max of a date in the second column and take all the information from other fields referring the record of the max value.
I cannot be able to to this in postgres.
As a pseudo code I can give this:
select 1
, max(2)
, 3 referred to the record from max(2)
, 4 referred to the record from max(2)
, ...
, 20 referred to the record from max(2)
from (ViewWithAllJoins) a
group by 1
For privacy and business problem I had to obfuscate some informations, 1/2/3/4... are the name of the column from the view "ViewWithAllJoins", I hope that the problem is still understandable and resolvable!
I've tryied with WINDOW command as reported in Convert keep dense_rank from Oracle query into postgres but I cannot be able to use the group by that I need. Other tryes that I've done was about the dense_rank like shown in Dense_rank first Oracle to Postgresql convert but I can't do any assumption on the order of the data in any of the other fields in exception of 1 and 2, so I can't use any of the aggregate function on them.
Any ideas? Possibly without adding too much subqueryes.
Thank you!
EDIT:
As suggested I'll add some synthetic data to better understand the problem and what I want.
Start:
ID DATE COLUMN1 COLUMN2 COLUMN3
=====================================================================
88888888;"2016-04-02 09:00:00";"aaaaaaaaaaa";"TEXT89" ; 999999999
88888888;"2018-08-21 09:00:00";"a" ;"TEXT1" ; 988888888
88888888;"2017-11-09 09:00:00";"zzzz" ;"TEXT80000" ; 850580582
75858585;"2017-01-31 09:00:00";"~~~~~~~~~~~";"TEXT10" ; 101010101
75858585;"2018-04-02 09:00:00";"eeeeeeeeeee";"TEXT1000" ; 111111111
99999999;"2016-04-02 09:00:00";"8d2ecafd866";"TEXT808911"; 777777777
What I want:
ID DATE COLUMN1 COLUMN2 COLUMN3
===================================================================
88888888;"2018-08-21 09:00:00";"a" ;"TEXT1" ; 988888888
75858585;"2018-04-02 09:00:00";"eeeeeeeeeee";"TEXT1000" ; 111111111
99999999;"2016-04-02 09:00:00";"8d2ecafd866";"TEXT808911"; 777777777
So the group by id, the max of the date and the other fields related to the row of the max date.
-- So you have duplicate records per ID, and for every ID you want to select the record with the most recent date ?
Use NOT EXISTS:
SELECT id,zdate,column1,column2,column3 -- , ...
FROM queryview t
WHERE NOT EXISTS (
SELECT *
FROM queryview x
WHERE x.id=t.id
AND x.zdate > t.zdate
);
Or, use row_number() over a window, and pick only the row with the final date:
SELECT id,zdate,column1,column2,column3 -- , ...
FROM ( SELECT *
, row_number() OVER(PARTITION BY id, ORDER BY zdate DESC) AS rn
FROM queryview
) q
WHERE q.rn = 1
;

PostgreSQL array_agg order for window functions

The answer to my question was almost here: PostgreSQL array_agg order
Except that I wanted to array_agg over a window function:
select distinct c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name
order by c2.vocabulary_id, c2.concept_name)
over (partition by ca.min_levels_of_separation),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where
c.concept_code = '44054006'
order by min_levels_of_separation;
So, maybe this will work in some future version, but I get this error
ERROR: aggregate ORDER BY is not implemented for window functions
LINE 2: select distinct c.concept_name, array_agg(c2.vocabulary_id||...
^
I should probably be selecting from a subquery like the first answer to the quoted question above suggests. I was hoping for something as simple as the order by (in that question's second answer). Or maybe I'm just being lazy about the query and should be doing a group by instead of select distinct.
I did try putting the order by in the windowing function (over (partition by ca.min_levels_of_separation order by c2.vocabulary_id, c2.concept_name)), but I get these sort of repeated rows that way:
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus"}";1
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus","MedDRA:Diabetes mellitus (incl subtypes)"}";1
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus","MedDRA:Diabetes mellitus (incl subtypes)","SNOMED:Diabetes mellitus"}";1
(btw: http://www.ohdsi.org/ if you happen to be curious about where I got the medical vocabulary tables)
Yes, it does look like I was being muddle-headed and didn't need the window function. This seems to work:
select c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name
order by c2.vocabulary_id, c2.concept_name),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where c.concept_code = '44054006'
group by c.concept_name, ca.min_levels_of_separation
order by min_levels_of_separation
I won't accept my answer for a while since it just avoids the question instead of actually answering it, and someone might have something more useful to say on the matter.
like this :
select distinct c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name ) over (partition by ca.min_levels_of_separation order by c2.vocabulary_id, c2.concept_name),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where
c.concept_code = '44054006'
order by min_levels_of_separation;