First and Last time with Night Labor time - postgresql

I have this issue.
These are the data that I have in the sample (I have more people and more times):
CREATE TABLE table_times(id,date_time,name)
AS ( VALUES
( 1000004, '2018-08-22 11:11'::timestamp without time zone, 'Carlos Eduardo' ),
( 1000004, '2018-08-22 11:43', 'Carlos Eduardo' ),
( 1000004, '2018-08-22 11:48', 'Carlos Eduardo' ),
( 1000004, '2018-08-22 11:54', 'Carlos Eduardo' ),
( 1000004, '2018-08-22 17:52', 'Carlos Eduardo' ),
( 1000004, '2018-08-23 08:13', 'Carlos Eduardo' ),
( 1000004, '2018-08-23 08:28', 'Carlos Eduardo' ),
( 1000004, '2018-08-23 10:25', 'Carlos Eduardo' ),
( 1000004, '2018-08-23 10:25', 'Carlos Eduardo' ),
( 1000004, '2018-08-23 10:25', 'Carlos Eduardo' ),
( 1000004, '2018-08-23 13:30', 'Carlos Eduardo' ),
( 1000004, '2018-08-24 22:20', 'Carlos Eduardo' ),
( 1000004, '2018-08-24 23:27', 'Carlos Eduardo' ),
( 1000004, '2018-08-25 03:14', 'Carlos Eduardo' ),
( 1000004, '2018-08-25 05:12', 'Carlos Eduardo' )
);
And I'm trying to find this:
id start end name
-------+-------------------+-------------------+--------------
1000004 2018-08-22 11:11:00 2018-08-22 17:52:00 Carlos Eduardo
1000004 2018-08-23 08:13:00 2018-08-23 13:30:00 Carlos Eduardo
1000004 2018-08-24 22:20:00 2018-08-25 05:12:00 Carlos Eduardo
I need to organize these data per date, like the days 22, 23 and organize the time when start at night and finish at morning, like 24 and 25.
I have some other people and they don't have a specific labor time then I need to get the time by these data. Besides that, I have just these data to organize and the maximum of 14 labor hours per day.
I tried the query below, but when the labor time is at night, it doesn't work.
SELECT
id,
MIN(date_time) AS start,
MAX(date_time) AS end,
name
FROM
table_times
GROUP BY
id
, name
, EXTRACT(DAY FROM date_time)
, EXTRACT(MONTH FROM date_time)
, EXTRACT(YEAR FROM date_time)
ORDER BY name, start ASC
May anyone help me?
PS: Sorry my poor english.

In order to make your problem tractable, we will need to define the boundaries for a shift. In this answer, I assume that you have two kinds of shifts. Day shifts, which start after 06:00 and end before 22:00, always occur within the same calendar day. Night shifts, which start after 22:00 and end before 06:00, wrap around across two days.
I employ an accounting trick below, by which I treat all night shift timestamps as logically belonging to the same starting date. This allows us to handle your boundary conditions.
WITH cte AS (
SELECT
id,
date_time,
name,
CASE WHEN EXTRACT(HOUR FROM date_time) >= 22 OR EXTRACT(HOUR FROM date_time) <= 6
THEN 1 ELSE 0 END AS shift,
CASE WHEN EXTRACT(HOUR FROM date_time) <= 6
THEN date_time::date - INTERVAL '1 DAY'
ELSE date_time::date END AS logical_date
FROM table_times
)
SELECT
id,
MIN(date_time) AS start,
MAX(date_time) AS end,
name
FROM cte
GROUP BY
id,
shift,
name,
logical_date
ORDER BY
name,
start;
Demo

Related

Need help in Postgres Conversion

Hello Guys I am trying to convert following script from MS SQL to PostgreSQL but unable to convert the starred ones in the below script
WITH procedurerange_cte AS (
select distinct
accountsize as HospitalSize
,min(catprocsannualintegratedallpayer) over (partition by accountsize) as MinProcsPerAcct
,max(catprocsannualintegratedallpayer) over (partition by accountsize) as MaxProcsPerAcct
from sandbox.vw_hopd_universe_1_ms
group by accountsize,catprocsannualintegratedallpayer
), accts_cte AS (
select
accountsize as HospitalSize
,count(master_id) as Count
,sum(catprocsannualintegratedallpayer) as catprocsannualintegratedallpayer
from sandbox.vw_hopd_universe_1_ms
group by accountsize
), allcatprocs_cte AS (
select
sum(catprocsannualintegratedallpayer) as AllAnnCatProcs
from sandbox.accts_universeaccts
), totals_cte AS (
select
case when HospitalSize is null then 'Total' else HospitalSize end as HospitalSize
,sum(Count) as Count
,sum(catprocsannualintegratedallpayer) as catprocsannualintegratedallpayer
from accts_cte
group by grouping sets ((HospitalSize,Count,catprocsannualintegratedallpayer),())
)
select
a.HospitalSize
,a.Count
***--,convert(float,a.Count)/convert(float,(select Count from totals_cte where HospitalSize='Total')) as %OfHospitals***
,a.catprocsannualintegratedallpayer as HospitalAnnCatProcs
***--,a.catprocsannualintegratedallpayer/(select catprocsannualintegratedallpayer from totals_cte where HospitalSize='Total') as %OfHospProcs***
***--,a.catprocsannualintegratedallpayer/(select AllAnnCatProcs from allCatProcs_cte) as %OfAllProcs***
,MinProcsPerAcct
,MaxProcsPerAcct
,***CASE
when a.HospitalSize='Large' then '8 to 10'
when a.HospitalSize='Medium' then '5 to 7'
when a.HospitalSize='Small' then '0 to 4'
end as DecilesIncluded***
from totals_cte as a
left join procedurerange_cte as b
on a.HospitalSize=b.HospitalSize
Please help in converting the above script to PostgreSQL as I am new to this field

How to `sum( DISTINCT <column> ) OVER ()` using window function?

I have next data:
Here I already calculated total for conf_id. But want also calculate total for whole partition. eg:
Calculate total suma by agreement for each its order (not goods at order which are with slightly different rounding)
How to sum 737.38 and 1238.3? eg. take only one number among group
(I can not sum( item_suma ), because it will return 1975.67. Notice round for conf_suma as intermediate step)
UPD
Full query. Here I want to calculate rounded suma for each group. Then I need to calculate total suma for those groups
SELECT app_period( '2021-02-01', '2021-03-01' );
WITH
target_date AS ( SELECT '2021-02-01'::timestamptz ),
target_order as (
SELECT
tstzrange( '2021-01-01', '2021-02-01') as bill_range,
o.*
FROM ( SELECT * FROM "order_bt" WHERE sys_period #> sys_time() ) o
WHERE FALSE
OR o.agreement_id = 3385 and o.period_id = 10
),
USAGE AS ( SELECT
ocd.*,
o.agreement_id as agreement_id,
o.id AS order_id,
(dense_rank() over (PARTITION BY o.agreement_id ORDER BY o.id )) as zzzz_id,
(dense_rank() over (PARTITION BY o.agreement_id, o.id ORDER BY (ocd.ic).consumed_period )) as conf_id,
sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id ) AS agreement_suma2,
(sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id, (ocd.ic).consumed_period )) AS x_suma,
(sum( ocd.item_cost ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id, (ocd.ic).consumed_period )) AS x_cost,
(sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id, (ocd.ic).consumed_period ))::numeric( 10, 2) AS conf_suma,
(sum( ocd.item_cost ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id, (ocd.ic).consumed_period ))::numeric( 10, 2) AS conf_cost,
max((ocd.ic).consumed) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id, (ocd.ic).consumed_period ) AS consumed,
(sum( ocd.item_suma ) OVER( PARTITION BY (ocd.o).agreement_id, (ocd.o).id )) AS order_suma2
FROM target_order o
LEFT JOIN order_cost_details( o.bill_range ) ocd
ON (ocd.o).id = o.id AND (ocd.ic).consumed_period && o.app_period
)
SELECT
*,
(conf_suma/6) ::numeric( 10, 2 ) as group_nds,
(SELECT sum(x) from (SELECT sum( DISTINCT conf_suma ) AS x FROM usage sub_u WHERE sub_u.agreement_id = usage.agreement_id GROUP BY agreement_id, order_id) t) as total_suma,
(SELECT sum(x) from (SELECT (sum( DISTINCT conf_suma ) /6)::numeric( 10, 2 ) AS x FROM usage sub_u WHERE sub_u.agreement_id = usage.agreement_id GROUP BY agreement_id, order_id) t) as total_nds
FROM USAGE
WINDOW w AS ( PARTITION BY usage.agreement_id ROWS CURRENT ROW EXCLUDE TIES)
ORDER BY
order_id,
conf_id
My old question
I found solution. See dbfiddle.
To run window function for distinct values I should get first value from each peer. To complete this I
aggregate IDs of rows for this peer
lag this aggregation by one
Mark rows that are not aggregated yet (this is first row at peer) as _distinct
sum( ) FILTER ( WHERE _distinct ) over ( ... )
Voila. You get sum over DISTINCT values at target PARTITION
which are not implemented yet by PostgreSQL
with data as (
select * from (values
( 1, 1, 1, 1.0049 ), (2, 1,1,1.0049), ( 3, 1,1,1.0049 ) ,
( 4, 1, 2, 1.0049 ), (5, 1,2,1.0057),
( 6, 2, 1, 1.53 ), ( 7,2,1,2.18), ( 8,2,2,3.48 )
) t (id, agreement_id, order_id, suma)
),
intermediate as (select
*,
sum( suma ) over ( partition by agreement_id, order_id ) as fract_order_suma,
sum( suma ) over ( partition by agreement_id ) as fract_agreement_total,
(sum( suma::numeric(10,2) ) over ( partition by agreement_id, order_id )) as wrong_order_suma,
(sum( suma ) over ( partition by agreement_id, order_id ))::numeric( 10, 2) as order_suma,
(sum( suma ) over ( partition by agreement_id ))::numeric( 10, 2) as wrong_agreement_total,
id as xid,
array_agg( id ) over ( partition by agreement_id, order_id ) as agg
from data),
distinc as (select *,
lag( agg ) over ( partition by agreement_id ) as prev,
id = any (lag( agg ) over ()) is not true as _distinct, -- allow to match first ID from next peer
order_suma as xorder_suma, -- repeat column to easily visually compare with _distinct
(SELECT sum(x) from (SELECT sum( DISTINCT order_suma ) AS x FROM intermediate sub_q WHERE sub_q.agreement_id = intermediate.agreement_id GROUP BY agreement_id, order_id) t) as correct_total_suma
from intermediate
)
select
*,
sum( order_suma ) filter ( where _distinct ) over ( partition by agreement_id ) as also_correct_total_suma
from distinc
better approach dbfiddle:
Assign row_number at each order: row_number() over (partition by agreement_id, order_id ) as nrow
Take only first suma: filter nrow = 1
with data as (
select * from (values
( 1, 1, 1, 1.0049 ), (2, 1,1,1.0049), ( 3, 1,1,1.0049 ) ,
( 4, 1, 2, 1.0049 ), (5, 1,2,1.0057),
( 6, 2, 1, 1.53 ), ( 7,2,1,2.18), ( 8,2,2,3.48 )
) t (id, agreement_id, order_id, suma)
),
intermediate as (select
*,
row_number() over (partition by agreement_id, order_id ) as nrow,
(sum( suma ) over ( partition by agreement_id, order_id ))::numeric( 10, 2) as order_suma,
from data)
select
*,
sum( order_suma ) filter (where nrow = 1) over (partition by agreement_id)
from intermediate```

How can I make this query more set based?

This is my first post here, so please let me know if I've not given everything needed.
I have been struggling to rewrite a process that has recently been causing me and my server significant performance issues.
The overall task is to identify where a customer has had to contact us back within +2 hours to +28 days of their previous contact. Currently this is being completed via the use of a cursor for all the contacts we received yesterday. This equates to approximately 50k contacts per day.
I am aware that this can be done through a cursor or a recursive CTE, but I feel like both options are bad. I am looking for another method to do the same job.
Below is a sample extract and the outcome i am expecting to see.
INSERT INTO SourceData ([CUSTOMER_KEY], [CONTACT_REFERENCE], [CONTACT_DATETIME], [EXPECTED_RESULT])
VALUES ('1', '100', '01/04/2020 09:00', 'Original Contact'),
('2', '101', '01/04/2020 10:00', 'Original Contact'),
('3', '102', '01/04/2020 11:00', 'Original Contact'),
('1', '103', '01/04/2020 12:00', 'Repeat of Contact Reference 100'),
('1', '104', '01/04/2020 13:00', 'Not Repeat - within 2 hours of previous contact'),
('1', '50' , '01/04/2020 14:00', 'Repeat of Contact Reference 103'),
('2', '105', '01/04/2020 14:00', 'Repeat of Contact Reference 101'),
('1', '106', '01/04/2020 15:00', 'Repeat of Contact Reference 104'),
('1', '200', '27/04/2020 12:00', 'Repeat of Contact Reference 106');
The process i currently follow is below. I am happy to update my post to provide code, but I don't think this will be too useful given that I am looking for other solutions.
Identify the current latest repeat of every customer. This was here to reduce the requirement on the full data table. If there was a repeat contact within the time frame already, then I can just assign it straight to that. This data is loaded into a new temp table: TempTable_Repeats_By_Customer.
Add all the contacts from yesterday to a temp table: TempTable_Yesterdays_Contacts
Open the cursor to start processing each Contact (from step 2) in order of Contact_DateTime (Ascending). At the same time i use TempTable_Repeats_By_Customer to identify if the customer has already had a repeat - and if this was within the eligible time frame.
If an existing repeat exists, retrieve the details from my existing reporting table and load a new row in.
If no existing repeat exists, check the full data table for other contacts received during the eligible period.
If there are more contacts from the same customer on a single day, I then go back and update TempTable_Repeats_By_Customer with the new details.
Either go to the next item in the cursor, or close and deallocate it.
Any help you all can give is much appreciated.
Perhaps I am overlooking something, but I think you should be able to do this using the LAG() function.
IF OBJECT_ID('tempdb.dbo.#SourceData', 'U') IS NOT NULL
DROP TABLE #SourceData;
CREATE TABLE #SourceData
(
[CUSTOMER_KEY] VARCHAR(10)
, [CONTACT_REFERENCE] VARCHAR(10)
, [CONTACT_DATETIME] DATETIME
, [EXPECTED_RESULT] VARCHAR(50)
);
INSERT INTO #SourceData
(
[CUSTOMER_KEY]
, [CONTACT_REFERENCE]
, [CONTACT_DATETIME]
, [EXPECTED_RESULT]
)
VALUES
('1', '100', '04/01/2020 09:00', 'Original Contact')
, ('2', '101', '04/01/2020 10:00', 'Original Contact')
, ('3', '102', '04/01/2020 11:00', 'Original Contact')
, ('1', '103', '04/01/2020 12:00', 'Repeat of Contact Reference 100')
, ('1', '104', '04/01/2020 13:00', 'Not Repeat - within 2 hours of previous contact')
, ('2', '105', '04/01/2020 14:00', 'Repeat of Contact Reference 101')
, ('1', '106', '04/01/2020 15:00', 'Repeat of Contact Reference 103')
, ('1', '200', '04/27/2020 12:00', 'Repeat of Contact Reference 106');
SELECT x.CUSTOMER_KEY
, x.CONTACT_REFERENCE
, x.CONTACT_DATETIME
, x.EXPECTED_RESULT
, x.[Minutes Difference]
FROM (
SELECT
CUSTOMER_KEY
, CONTACT_REFERENCE
, CONTACT_DATETIME
, EXPECTED_RESULT
, DATEDIFF(
MINUTE
, LAG(CONTACT_DATETIME) OVER
(PARTITION BY CUSTOMER_KEY ORDER BY CONTACT_DATETIME)
, CONTACT_DATETIME
) AS [Minutes Difference]
FROM #SourceData
) x
WHERE x.[Minutes Difference] > 60
AND x.[Minutes Difference] < 40320 -- this is the number of minutes in 28 days
Here is the demo.
The following code uses a recursive CTE to process the contacts in date/time order for each customer. Like Isaac's answer it calculates a delta time in minutes which may or may not be adequate resolution for your purposes.
NB: DateDiff "returns the count (as a signed integer value) of the specified datepart boundaries crossed". If you specify a datepart of day you'll get the number of midnights crossed, not the number of 24-hour periods. For example, Monday # 23:00 to Wednesday # 01:00 is 26 hours or two midnights, while Tuesday # 01:00 to Wednesday # 03:00 is still 26 hours, but only one midnight.
declare #SourceData as Table ( Customer_Key Int, Contact_Reference Int, Contact_DateTime DateTime, Expected_Result VarChar(50) );
INSERT INTO #SourceData ([CUSTOMER_KEY], [CONTACT_REFERENCE], [CONTACT_DATETIME], [EXPECTED_RESULT])
VALUES ('1', '100', '2020-04-01 09:00', 'Original Contact'),
('2', '101', '2020-04-01 10:00', 'Original Contact'),
('3', '102', '2020-04-01 11:00', 'Original Contact'),
('1', '103', '2020-04-01 12:00', 'Repeat of Contact Reference 100'),
('1', '104', '2020-04-01 13:00', 'Not Repeat - within 2 hours of previous contact'),
('2', '105', '2020-04-01 14:00', 'Repeat of Contact Reference 101'),
('1', '106', '2020-04-01 15:00', 'Repeat of Contact Reference 103'),
('1', '200', '2020-04-27 12:00', 'Repeat of Contact Reference 106');
with
ContactsByCustomer as (
-- Add a row number to simplify processing the contacts for each customer in Contact_DateTime order.
select Customer_Key, Contact_Reference, Contact_DateTime, Expected_Result,
Row_Number() over ( partition by Customer_Key order by Contact_DateTime ) as RN
from #SourceData ),
ProcessedContacts as (
-- Process the contacts in date/time order for each customer.
-- Start with the first contact for each customer ...
select Customer_Key, Contact_Reference, Contact_DateTime, Expected_Result, RN,
Cast( 'Original Contact' as VarChar(100) ) as Computed_Result,
0 as Delta_Minutes
from ContactsByCustomer
where RN = 1
union all
-- ... and add each subsequent contact in date/time order.
select CBC.Customer_Key, CBC.Contact_Reference, CBC.Contact_DateTime, CBC.Expected_Result, CBC.RN,
Cast(
case
when PH.Delta_Minutes < 120 then
'No Repeat - within 2 hours of previous contact'
when 120 <= PH.Delta_Minutes and PH.Delta_Minutes <= 40320 then
'Repeat of Contact Reference ' + Cast( PC.Contact_Reference as VarChar(10) )
else
'Original'
end
as VarChar(100) ),
PH.Delta_Minutes
from ProcessedContacts as PC inner join
ContactsByCustomer as CBC on CBC.Customer_Key = PC.Customer_Key and CBC.RN = PC.RN + 1 cross apply
-- Using cross apply makes it easy to use the calculated value as needed.
( select DateDiff( minute, PC.Contact_DateTime, CBC.Contact_DateTime ) as Delta_Minutes ) as PH
)
-- You can uncomment the select to see the intermediate results.
-- select * from ContactsByCustomer;
select *
from ProcessedContacts
order by Customer_Key, Contact_DateTime;

Move Saturday into Friday then AVG

I've got a table I'm working with that is about 165M rows of data - Working on creating an average daily usage, but only for the working week. I need to take the inventory transactions from Saturday and count them in Friday and take the Sunday moved into Monday.
CREATE TABLE #temptable ( [ITEMID] nvarchar(20), [Daily Usage] decimal(38,10), [CalendarDate] date )
INSERT INTO #temptable
VALUES
( N'A24519-01', 0.0000000000, N'2019-02-18T00:00:00' ),
( N'A24519-01', 7.0000000000, N'2019-02-19T00:00:00' ),
( N'A24519-01', 10.0000000000, N'2019-02-20T00:00:00' ),
( N'A24519-01', 4.0000000000, N'2019-02-21T00:00:00' ),
( N'A24519-01', 11.0000000000, N'2019-02-22T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-02-23T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-02-24T00:00:00' ),
( N'A24519-01', 9.0000000000, N'2019-02-25T00:00:00' ),
( N'A24519-01', 5.0000000000, N'2019-02-26T00:00:00' ),
( N'A24519-01', 8.0000000000, N'2019-02-27T00:00:00' ),
( N'A24519-01', 17.0000000000, N'2019-02-28T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-03-01T00:00:00' ),
( N'A24519-01', 1.0000000000, N'2019-03-02T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-03-03T00:00:00' ),
( N'A24519-01', 1.0000000000, N'2019-03-04T00:00:00' ),
( N'A24519-01', 12.0000000000, N'2019-03-05T00:00:00' ),
( N'A24519-01', 4.0000000000, N'2019-03-06T00:00:00' ),
( N'A24519-01', 14.0000000000, N'2019-03-07T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-03-08T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-03-09T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-03-10T00:00:00' ),
( N'A24519-01', 4.0000000000, N'2019-03-11T00:00:00' ),
( N'A24519-01', 9.0000000000, N'2019-03-12T00:00:00' ),
( N'A24519-01', 6.0000000000, N'2019-03-13T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-03-14T00:00:00' ),
( N'A24519-01', 14.0000000000, N'2019-03-15T00:00:00' ),
( N'A24519-01', 1.0000000000, N'2019-03-16T00:00:00' ),
( N'A24519-01', 0.0000000000, N'2019-03-17T00:00:00' )
So if I run the below on the above I get 4.89
SELECT AVG(1 * [Daily Usage])
FROM #temptable
I'm trying to get 6.85 - I don't how to move the number from Sat/Sun into the Fri/Mon - them remove the weekends from the #temptable
If anyone has the same issue and comes here - below it what I ended up using in production of this problem.
--Move Saturday data into Friday and move Sunday data into Monday (Working days for buyers...)
UPDATE
EDU
SET
EDU.[Daily Usage] = EDU.[Daily Usage] + ND.[Daily Usage]
FROM #ExplodedDailyUsage AS EDU
INNER JOIN (SELECT
t.ITEMID
, t.[Daily Usage]
, t.CalendarDate
, DATEPART(dw, t.CalendarDate) DOW
, DATEADD(
DAY, CASE DATEPART(WEEKDAY, t.CalendarDate) WHEN 7 THEN -1 WHEN 1 THEN 1 ELSE 0 END, t.CalendarDate) AS NewDate
FROM #ExplodedDailyUsage AS t
WHERE DATEPART(dw, t.CalendarDate) IN ( 7, 1 )) ND ON ND.NewDate = EDU.CalendarDate
AND ND.ITEMID = EDU.ITEMID;
--Delete Saturdays and Sundays
DELETE FROM #ExplodedDailyUsage WHERE DATEPART(dw, CalendarDate) IN ( 7, 1 );

Is this correct to use DISTINCT instead of GROUP BY? [duplicate]

This question already has answers here:
How to `sum( DISTINCT <column> ) OVER ()` using window function?
(2 answers)
Closed 1 year ago.
Example data:
id | docn | item | suma
---------------------
1 33 x | 10
1 33 y | 20
2 37 a | 10
2 37 b | 20
2 37 c | 30
To group results I can write:
SELECT sum( suma ),
(ocd.o).*
FROM order_cost_details() ocd
where (ocd.o).id IN ( 6154, 10805 )
GROUP BY ocd.o
But in a place with a group I want to select last_value for each group. Next does not work:
SELECT sum( suma ),
(ocd.o).*,
last_value( ocd.c ) OVER (PARTITION BY ocd.o )
FROM order_cost_details() ocd
where (ocd.o).id IN ( 6154, 10805 )
GROUP BY ocd.o
SQL Error [42803]: ERROR: column "ocd.c" must appear in the GROUP BY clause or be used in an aggregate function
I rewrite my query like next:
SELECT DISTINCT sum( suma ) OVER ( PARTITION BY ocd.o ),
(ocd.o).*,
last_value( ocd.c ) OVER (PARTITION BY ocd.o )
FROM order_cost_details() ocd
where (ocd.o).id IN ( 6154, 10805 )
Results seems expected:
with correct last_value:
But I am not sure is this correct to use DISTINCT instead of GROUP BY here?
last_value() often does not work as expected Window Functions: last_value(ORDER BY ... ASC) same as last_value(ORDER BY ... DESC)
To get the last value of a partition, a more valid way is getting the first value of the descending order:
SELECT
first_value(my_column) OVER (PARTITION BY partitioned_column ORDER BY order_column DESC)
FROM
...
You can use a subselect:
SELECT sum(suma),
(o).*,
last_c
FROM (SELECT suma,
ocd.o
last_value(ocd.c)
OVER (PARTITION BY ocd.o
ORDER BY some_col)
AS last_c
FROM order_cost_details() ocd
where (ocd.o).id IN (6154, 10805)
) AS q
GROUP BY o, last_c;
From the IRC
RhodiumToad: using DISTINCT ON is almost always a mistake (do bear in mind it's completely non-standard)
The basic rule of thumb is that you use GROUP BY when you want to reduce the number of output rows, and window functions when you want to keep the number of rows the same
Nothing stops you doing a total over the orders using a window function after the group by
Thus I rewrite my query to look like:
SELECT *,
sum( t.group_suma ) OVER( PARTITION BY (t.o).id ) AS total_suma
FROM (
SELECT
sum( ocd.item_cost ) AS group_cost,
sum( ocd.item_suma ) AS group_suma,
max( (ocd.ic).consumed ) AS consumed,
ocd.o
FROM order_cost_details() ocd
where (ocd.o).id IN ( 6154, 10805 )
GROUP BY ocd.o, (ocd.ic).consumed_period
) t