Postgresql duplicate last row in group - postgresql

I have a set of data like this.
detail_id working_time employee_id additional_info
10 2020-08-26 01:00:00 1 10
10 2020-08-26 02:00:00 1 20
10 2020-08-26 03:00:00 1 30
10 2020-08-26 04:00:00 1 40
10 2020-08-26 05:00:00 1 50
10 2020-08-26 06:00:00 1 60
10 2020-08-26 07:00:00 1 70
10 2020-08-26 08:00:00 2 80
10 2020-08-26 09:00:00 2 90
10 2020-08-26 10:00:00 2 100
10 2020-08-26 11:00:00 2 110
10 2020-08-26 12:00:00 2 120
10 2020-08-26 13:00:00 2 130
10 2020-08-26 14:00:00 2 140
10 2020-08-26 15:00:00 2 150
10 2020-08-26 16:00:00 1 160
10 2020-08-26 17:00:00 1 170
10 2020-08-26 18:00:00 1 180
Imagine that we have two workers who are working on the same detail in two shifts.
The first employee is working from 01:00:00 - 07:00:00,
second is working from 07:00:00 - 15:00:00, first again started working from 15:00:00 - 18:00:00
So, basically I need to duplicate the last row (grouped by employee_id) in the select in case if employee_id is changing. The final result should look like
detail_id working_time employee_id additional_info
10 2020-08-26 01:00:00 1 10
10 2020-08-26 02:00:00 1 20
10 2020-08-26 03:00:00 1 30
10 2020-08-26 04:00:00 1 40
10 2020-08-26 05:00:00 1 50
10 2020-08-26 06:00:00 1 60
10 2020-08-26 07:00:00 1 70
10 2020-08-26 07:00:00 2 70
10 2020-08-26 08:00:00 2 80
10 2020-08-26 09:00:00 2 90
10 2020-08-26 10:00:00 2 100
10 2020-08-26 11:00:00 2 110
10 2020-08-26 12:00:00 2 120
10 2020-08-26 13:00:00 2 130
10 2020-08-26 14:00:00 2 140
10 2020-08-26 15:00:00 2 150
10 2020-08-26 15:00:00 1 150
10 2020-08-26 16:00:00 1 160
10 2020-08-26 17:00:00 1 170
10 2020-08-26 18:00:00 1 180
I know how to find a place of changing employee_id by using lead function:
WHEN lag(employee_id) OVER (ORDER BY detail_id, working_time) <> employee_id THEN ...
but I don't know how to duplicate row
Link to SQLFiddle

You can get the lead() worker as you already know and compare that to the current worker. But I suspect that you rather want PARTITION BY detail_id instead of ORDER BY. Your example isn't clear enough in that respect as there is only one detail_id.
But a CASE expression is of little use here, as it cannot produce additional rows. But you can compare the lead() worker against the current worker in a WHERE clause. If they're different, the row is one of the additional rows. Use UNION ALL to add that to the other "plain" row from the table.
If you want to order the end result put that UNION ALL operation in yet another derived table and SELECT from that with an ORDER BY.
SELECT y.detail_id,
y.working_time,
y.employee_id,
y.additional_info
FROM (SELECT w.detail_id,
w.working_time,
w.employee_id,
w.additional_info
FROM workers w
UNION ALL
SELECT x.detail_id,
x.working_time,
x.lead_employee_id employee_id,
x.additional_info
FROM (SELECT w.detail_id,
w.working_time,
w.employee_id,
w.additional_info,
lead(w.employee_id) OVER (PARTITION BY w.detail_id
ORDER BY w.working_time) lead_employee_id
FROM workers w) x
WHERE x.lead_employee_id <> x.employee_id) y
ORDER BY y.working_time;
What is odd though is that your rule doesn't seem to apply to the row with 2020-08-26 01:00:00. How is that the actual start time and not the record before (which doesn't exists, I know) but for all the other cases it's not the actual time but the time before? Maybe you should rework how you store the data and just always insert the actual starting and ending time too.
And your fiddle uses MySQL instead of Postgres BTW.

Related

Fetch last available value if there is NULL

I have a table that is essentially a purchases table that has purchase prices. When a purchase is made, it is recorded at an hour. Like in the table, ABC-123 was purchased on 2022-1-20 at 12:00. I want the NULL values to show 20 as long as a new purchase price is not punched in. Same for the other id_code.
id_code
hour
purchase_price
ABC-123
2022-1-20 12:00
20
ABC-123
2022-1-20 13:00
NULL
ABC-123
2022-1-20 14:00
NULL
BCD-123
2022-1-20 12:00
35
BCD-123
2022-1-20 13:00
36
BCD-123
2022-1-20 14:00
NULL
The output table will look like this:
It will replace the NULLs with the previously available price for its particular id_code.
id_code
hour
purchase_price
ABC-123
2022-1-20 12:00
20
ABC-123
2022-1-20 13:00
20
ABC-123
2022-1-20 14:00
20
BCD-123
2022-1-20 12:00
35
BCD-123
2022-1-20 13:00
36
BCD-123
2022-1-20 14:00
36
I did find a similar question here but that seems to not work because my IDs are not incremental integers I think.
You can create a view with an aggregate function. Try this :
CREATE VIEW test_view AS
( SELECT id_code
, hour
, (array_agg(purchase_price) FILTER (WHERE purchase_price IS NOT NULL) OVER (PARTITION BY id_code ORDER BY hour DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING))[1]
FROM test
)
Result :
id_code hour array_agg
ABC-123 2022-01-20 12:00:00 20
ABC-123 2022-01-20 13:00:00 20
ABC-123 2022-01-20 14:00:00 20
BCD-123 2022-01-20 12:00:00 35
BCD-123 2022-01-20 13:00:00 36
BCD-123 2022-01-20 14:00:00 36
see the demo in dbfiddle.

How to get the minimum value of unique items based upon a datediff function in t-sql?

I am trying to figure out the minimum time elapsed between two columns, grouped by values in a third column
ID
Start Time
End Time
1
2021-08-22 00:00:00
2021-08-24 00:00:00
1
2021-08-21 00:00:00
2021-08-24 00:00:00
2
2021-08-22 00:00:00
2021-08-24 00:00:00
2
2021-08-21 00:00:00
2021-08-24 00:00:00
3
2021-08-22 00:00:00
2021-08-24 00:00:00
3
2021-08-21 00:00:00
2021-08-24 00:00:00
From this table, I would like to get the results:
ID
Elapsed Time
1
48 hours
2
48 hours
3
48 hours
Currently I have this SQL function
SELECT ID, datediff(hour, Start Time, End Time) as diff
FROM t
WHERE
MIN(diff)
GROUP BY ID
Jacob, this should give you the results you are looking for:
SELECT
ID,
MIN(DATEDIFF (HOUR, StartTime, EndTime)) AS diff
FROM
t
GROUP BY
ID;

get fetch the data of last 20 tuesday in aws redshift

I want to find data for the last 20 Tuesday.
I have to write fro getting the desired result as mentioned below.
Date value
2020-03-03 01:12:15 5
2020-02-25 07:12:15 13
2020-02-24 08:12:15 1
2020-02-23 09:12:15 32
2020-02-22 10:12:15 7
2020-02-21 11:12:15 43
2020-02-20 12:12:15 7
2020-02-19 13:12:15 1
2020-02-18 14:12:15 31
2020-02-17 15:12:15 14
2020-02-16 15:12:15 2
2020-02-15 15:12:15 14
2020-02-14 14:12:15 31
2020-02-13 15:12:15 11
2020-02-12 15:12:15 2
2020-02-11 15:12:15 14
2020-02-10 15:12:15 12
and so one
My desired output is
Date value
2020-02-24 01:12:15 1
2020-02-17 07:12:15 14
2020-02-10 14:12:15 12
and so on
You could probably use:
WHERE
DATE_PART(dow, date_field) = 2
AND date_field >= CURRENT_DATE - INTERVAL '140 DAYS'
See: DATE_PART Function - Amazon Redshift

PostgreSQL - filter function for dates

I am trying to use the built-in filter function in PostgreSQL to filter for a date range in order to sum only entries falling within this time-frame.
I cannot understand why the filter isn't being applied.
I am trying to filter for all product transactions that have a created_at date of the previous month (so in this case that were created in June 2017).
SELECT pt.created_at::date, pt.customer_id,
sum(pt.amount/100::double precision) filter (where (date_part('month', pt.created_at) =date_part('month', NOW() - interval '1 month') and
date_part('year', pt.created_at) = date_part('year', NOW()) ))
from
product_transactions pt
LEFT JOIN customers c
ON c.id= pt.customer_id
GROUP BY pt.created_at::date,pt.customer_id
Please find my expected results (sum of the amount for each day in the previous month - for each customer_id if an entry for that day exists) and the actual results I get from the query - below (using date_trunc).
Expected results:
created_at| customer_id | amount
2017-06-30 1 220.5
2017-06-28 15 34.8
2017-06-28 12 157
2017-06-28 48 105.6
2017-06-27 332 425.8
2017-06-25 1 58.0
2017-06-25 23 22.5
2017-06-21 14 88.9
2017-06-17 2 34.8
2017-06-12 87 250
2017-06-05 48 135.2
2017-06-05 12 95.7
2017-06-01 44 120
Results:
created_at| customer_id | amount
2017-06-30 1 220.5
2017-06-28 15 34.8
2017-06-28 12 157
2017-06-28 48 105.6
2017-06-27 332 425.8
2017-06-25 1 58.0
2017-06-25 23 22.5
2017-06-21 14 88.9
2017-06-17 2 34.8
2017-06-12 87 250
2017-06-05 48 135.2
2017-06-05 12 95.7
2017-06-01 44 120
2017-05-30 XX YYY
2017-05-25 XX YYY
2017-05-15 XX YYY
2017-04-30 XX YYY
2017-03-02 XX YYY
2016-11-02 XX YYY
The actual results give me the sum for all dates in the database, so no date time-frame is being applied in the query for a reason I cannot understand. I'm seeing dates that are both not for June 2017 and also from previous years.
Use date_trunc(..) function:
SELECT pt.created_at::date, pt.customer_id, c.name,
sum(pt.amount/100::double precision) filter (where date_trunc('month', pt.created_at) = date_trunc('month', NOW() - interval '1 month'))
from
product_transactions pt
LEFT JOIN customers c
ON c.id= pt.customer_id
GROUP BY pt.created_at::date

Trigger an event when the event count maximum of last 12 months window

I have a requirement like, Trigger an event when the idle well count maximum of last 12 months window.
For Example:
Well_date Count
1986-01-01 00:00:00 17
1986-02-01 00:00:00 16
1986-03-01 00:00:00 23
1986-04-01 00:00:00 33
1986-05-01 00:00:00 31
1986-06-01 00:00:00 42
1986-07-01 00:00:00 43
1986-08-01 00:00:00 43
1986-09-01 00:00:00 41
1986-10-01 00:00:00 42
1986-11-01 00:00:00 46
1986-12-01 00:00:00 52
Output:
1986-12-01 00:00:00 52
Suppose, if the event count is minimum of last 11 months then it will be ignored.
Thanks in advance
This one will give you a stream of last max well counts, i.e. the max excluding the current event:
insert into LastMaxStream select rstream max(well_count) as lastMax from SomeEvent
The LastMaxStream can be used to compare:
#name('out') select * from SomeEvent(well_count > (select lastMax from LastMaxStream.std:lastevent()));
There may be other solutions but that is the one that comes to mind. For considering some time period add that to the group-by clause, or declare a context that starts when 1986 starts and ends when 1986 ends, for example.