get fetch the data of last 20 tuesday in aws redshift - amazon-redshift

I want to find data for the last 20 Tuesday.
I have to write fro getting the desired result as mentioned below.
Date value
2020-03-03 01:12:15 5
2020-02-25 07:12:15 13
2020-02-24 08:12:15 1
2020-02-23 09:12:15 32
2020-02-22 10:12:15 7
2020-02-21 11:12:15 43
2020-02-20 12:12:15 7
2020-02-19 13:12:15 1
2020-02-18 14:12:15 31
2020-02-17 15:12:15 14
2020-02-16 15:12:15 2
2020-02-15 15:12:15 14
2020-02-14 14:12:15 31
2020-02-13 15:12:15 11
2020-02-12 15:12:15 2
2020-02-11 15:12:15 14
2020-02-10 15:12:15 12
and so one
My desired output is
Date value
2020-02-24 01:12:15 1
2020-02-17 07:12:15 14
2020-02-10 14:12:15 12
and so on

You could probably use:
WHERE
DATE_PART(dow, date_field) = 2
AND date_field >= CURRENT_DATE - INTERVAL '140 DAYS'
See: DATE_PART Function - Amazon Redshift

Related

Maximum count of overlapping intervals in PostgreSQL

Suppose there is a table structured as follows:
id start end
--------------------
01 00:18 00:23
02 00:22 00:31
03 00:23 00:48
04 00:23 00:39
05 00:24 00:25
06 00:24 00:31
07 00:24 00:38
08 00:25 00:37
09 00:26 00:42
10 00:31 00:34
11 00:33 00:38
The objective is to compute the overall maximum number of rows having been active (i.e. between start and end) at any given moment in time. This would be relatively straightforward using a procedural algorithm, but I'm not sure how to do this in SQL.
According to the above example, this maximum value would be 8 and would correspond to the 00:31 timestamp where active rows were 2, 3, 4, 6, 7, 8, 9, 10 (as shown in the schema below).
Obtaining the timestamp(s) and the active rows corresponding to the maximum value is not important, all is needed is the actual value itself.
I was thinking of at first, using generate_series() to iterate every minute and get the count of active intervals for each, then take the max of this.
You can improve your idea and iterate only "start" values from the table because one of "start" points includes in time interval with maximum active rows.
select id, start,
(select count(1) from tbl t where tbl.start between t.start and t."end")
from tbl;
Here results
id start count
-----------------
1 00:18:00 1
2 00:22:00 2
3 00:23:00 4
4 00:23:00 4
5 00:24:00 6
6 00:24:00 6
7 00:24:00 6
8 00:25:00 7
9 00:26:00 7
10 00:31:00 8
11 00:33:00 7
So, this query gives you maximum number of rows having been active
select
max((select count(1) from tbl t where tbl.start between t.start and t."end"))
from tbl;
max
-----
8

Postgresql duplicate last row in group

I have a set of data like this.
detail_id working_time employee_id additional_info
10 2020-08-26 01:00:00 1 10
10 2020-08-26 02:00:00 1 20
10 2020-08-26 03:00:00 1 30
10 2020-08-26 04:00:00 1 40
10 2020-08-26 05:00:00 1 50
10 2020-08-26 06:00:00 1 60
10 2020-08-26 07:00:00 1 70
10 2020-08-26 08:00:00 2 80
10 2020-08-26 09:00:00 2 90
10 2020-08-26 10:00:00 2 100
10 2020-08-26 11:00:00 2 110
10 2020-08-26 12:00:00 2 120
10 2020-08-26 13:00:00 2 130
10 2020-08-26 14:00:00 2 140
10 2020-08-26 15:00:00 2 150
10 2020-08-26 16:00:00 1 160
10 2020-08-26 17:00:00 1 170
10 2020-08-26 18:00:00 1 180
Imagine that we have two workers who are working on the same detail in two shifts.
The first employee is working from 01:00:00 - 07:00:00,
second is working from 07:00:00 - 15:00:00, first again started working from 15:00:00 - 18:00:00
So, basically I need to duplicate the last row (grouped by employee_id) in the select in case if employee_id is changing. The final result should look like
detail_id working_time employee_id additional_info
10 2020-08-26 01:00:00 1 10
10 2020-08-26 02:00:00 1 20
10 2020-08-26 03:00:00 1 30
10 2020-08-26 04:00:00 1 40
10 2020-08-26 05:00:00 1 50
10 2020-08-26 06:00:00 1 60
10 2020-08-26 07:00:00 1 70
10 2020-08-26 07:00:00 2 70
10 2020-08-26 08:00:00 2 80
10 2020-08-26 09:00:00 2 90
10 2020-08-26 10:00:00 2 100
10 2020-08-26 11:00:00 2 110
10 2020-08-26 12:00:00 2 120
10 2020-08-26 13:00:00 2 130
10 2020-08-26 14:00:00 2 140
10 2020-08-26 15:00:00 2 150
10 2020-08-26 15:00:00 1 150
10 2020-08-26 16:00:00 1 160
10 2020-08-26 17:00:00 1 170
10 2020-08-26 18:00:00 1 180
I know how to find a place of changing employee_id by using lead function:
WHEN lag(employee_id) OVER (ORDER BY detail_id, working_time) <> employee_id THEN ...
but I don't know how to duplicate row
Link to SQLFiddle
You can get the lead() worker as you already know and compare that to the current worker. But I suspect that you rather want PARTITION BY detail_id instead of ORDER BY. Your example isn't clear enough in that respect as there is only one detail_id.
But a CASE expression is of little use here, as it cannot produce additional rows. But you can compare the lead() worker against the current worker in a WHERE clause. If they're different, the row is one of the additional rows. Use UNION ALL to add that to the other "plain" row from the table.
If you want to order the end result put that UNION ALL operation in yet another derived table and SELECT from that with an ORDER BY.
SELECT y.detail_id,
y.working_time,
y.employee_id,
y.additional_info
FROM (SELECT w.detail_id,
w.working_time,
w.employee_id,
w.additional_info
FROM workers w
UNION ALL
SELECT x.detail_id,
x.working_time,
x.lead_employee_id employee_id,
x.additional_info
FROM (SELECT w.detail_id,
w.working_time,
w.employee_id,
w.additional_info,
lead(w.employee_id) OVER (PARTITION BY w.detail_id
ORDER BY w.working_time) lead_employee_id
FROM workers w) x
WHERE x.lead_employee_id <> x.employee_id) y
ORDER BY y.working_time;
What is odd though is that your rule doesn't seem to apply to the row with 2020-08-26 01:00:00. How is that the actual start time and not the record before (which doesn't exists, I know) but for all the other cases it's not the actual time but the time before? Maybe you should rework how you store the data and just always insert the actual starting and ending time too.
And your fiddle uses MySQL instead of Postgres BTW.

PostgreSQL - How can I SUM until a certain hour of the day?

I'm trying to create a metric for a PostgreSQL integrated dashboard which would show today's "Total Payment Value" (TPV) of a certain product, as well as yesterday's TPV of the same product, up until the same moment as today, so if I'm accessing the dashboard at 5 pm, it will show what it was yesterday until 5 pm and today's TPV.
edit: My question wasn't very clear so I'm adding a few more lines and editing the query, which had a mistake.
I tried this:
select
sum(case when table.product in (13,14,15,16) then amount else 0 end) as "TPV"
,date_trunc('day', table.date) as "Day"
from table
where
date > current_date - 1
group by date_trunc('day', table.date)
order by 2,1
I only want to sum the amount when product = 13, 14, 15 or 16
An example of the product, date and amount would be like this:
product amount date
8 4750 19/03/2019 00:21
14 7840 12/04/2019 22:40
14 15000 22/03/2019 18:27
14 11715 19/03/2019 00:12
14 1054 22/03/2019 18:22
14 18491 17/03/2019 14:28
14 12253 17/03/2019 14:30
14 27600 17/03/2019 14:32
14 3936 17/03/2019 14:28
14 19007 19/03/2019 00:14
8 9400 19/03/2019 00:21
8 4750 19/03/2019 00:21
8 25000 19/03/2019 00:17
14 10346 22/03/2019 18:23
I would like to have a metric that always calculates the sum of the product value today up until the current moment - when the "product" corresponds to values 13, 14, 15 or 16 - as well as the same metric for yesterday, e.g., it's 1 PM now, I want today's TPV until 1 PM and yesterday's TPV until 1 PM as well!

PostgreSQL - filter function for dates

I am trying to use the built-in filter function in PostgreSQL to filter for a date range in order to sum only entries falling within this time-frame.
I cannot understand why the filter isn't being applied.
I am trying to filter for all product transactions that have a created_at date of the previous month (so in this case that were created in June 2017).
SELECT pt.created_at::date, pt.customer_id,
sum(pt.amount/100::double precision) filter (where (date_part('month', pt.created_at) =date_part('month', NOW() - interval '1 month') and
date_part('year', pt.created_at) = date_part('year', NOW()) ))
from
product_transactions pt
LEFT JOIN customers c
ON c.id= pt.customer_id
GROUP BY pt.created_at::date,pt.customer_id
Please find my expected results (sum of the amount for each day in the previous month - for each customer_id if an entry for that day exists) and the actual results I get from the query - below (using date_trunc).
Expected results:
created_at| customer_id | amount
2017-06-30 1 220.5
2017-06-28 15 34.8
2017-06-28 12 157
2017-06-28 48 105.6
2017-06-27 332 425.8
2017-06-25 1 58.0
2017-06-25 23 22.5
2017-06-21 14 88.9
2017-06-17 2 34.8
2017-06-12 87 250
2017-06-05 48 135.2
2017-06-05 12 95.7
2017-06-01 44 120
Results:
created_at| customer_id | amount
2017-06-30 1 220.5
2017-06-28 15 34.8
2017-06-28 12 157
2017-06-28 48 105.6
2017-06-27 332 425.8
2017-06-25 1 58.0
2017-06-25 23 22.5
2017-06-21 14 88.9
2017-06-17 2 34.8
2017-06-12 87 250
2017-06-05 48 135.2
2017-06-05 12 95.7
2017-06-01 44 120
2017-05-30 XX YYY
2017-05-25 XX YYY
2017-05-15 XX YYY
2017-04-30 XX YYY
2017-03-02 XX YYY
2016-11-02 XX YYY
The actual results give me the sum for all dates in the database, so no date time-frame is being applied in the query for a reason I cannot understand. I'm seeing dates that are both not for June 2017 and also from previous years.
Use date_trunc(..) function:
SELECT pt.created_at::date, pt.customer_id, c.name,
sum(pt.amount/100::double precision) filter (where date_trunc('month', pt.created_at) = date_trunc('month', NOW() - interval '1 month'))
from
product_transactions pt
LEFT JOIN customers c
ON c.id= pt.customer_id
GROUP BY pt.created_at::date

Trigger an event when the event count maximum of last 12 months window

I have a requirement like, Trigger an event when the idle well count maximum of last 12 months window.
For Example:
Well_date Count
1986-01-01 00:00:00 17
1986-02-01 00:00:00 16
1986-03-01 00:00:00 23
1986-04-01 00:00:00 33
1986-05-01 00:00:00 31
1986-06-01 00:00:00 42
1986-07-01 00:00:00 43
1986-08-01 00:00:00 43
1986-09-01 00:00:00 41
1986-10-01 00:00:00 42
1986-11-01 00:00:00 46
1986-12-01 00:00:00 52
Output:
1986-12-01 00:00:00 52
Suppose, if the event count is minimum of last 11 months then it will be ignored.
Thanks in advance
This one will give you a stream of last max well counts, i.e. the max excluding the current event:
insert into LastMaxStream select rstream max(well_count) as lastMax from SomeEvent
The LastMaxStream can be used to compare:
#name('out') select * from SomeEvent(well_count > (select lastMax from LastMaxStream.std:lastevent()));
There may be other solutions but that is the one that comes to mind. For considering some time period add that to the group-by clause, or declare a context that starts when 1986 starts and ends when 1986 ends, for example.