Postgres version: 10
Table example:
CREATE TABLE log (
group_id INTEGER,
log_begin TIMESTAMP,
log_end TIMESTAMP
);
My goal: I want to know, for distinct groups, which log began right after the current log ends for each row or NULL if does not exists. Example: if the log of row 1 ends at 2022-07-15 15:30:00, the next log begins at 2022-07-15 16:00:00, so 2022-07-15 16:00:00 is the answer. If the log of row 4 ends at 2022-07-15 15:20:00, the next log begins at 2022-07-15 15:30:00, so it's the answer
Example data:
group_id
log_begin
log_end
1
2022-07-15 15:00:00
2022-07-15 15:30:00
1
2022-07-15 16:00:00
2022-07-15 16:30:00
1
2022-07-15 17:00:00
2022-07-15 17:30:00
2
2022-07-15 15:00:00
2022-07-15 15:20:00
2
2022-07-15 15:15:00
2022-07-15 15:40:00
2
2022-07-15 15:30:00
2022-07-15 16:30:00
My first solution was use a sub-query and search the next value for every row, but this table is very big, so the query result is correct, but it's very slow. Something like this:
SELECT *, ( SELECT _L.log_begin FROM log _L
WHERE _L.log_begin > L.log_end
AND _L.group_id = L.group_id
ORDER BY _L.log_begin ASC LIMIT 1 ) AS next_log_begin
FROM log L
My second solution was use a window function like LEAD as above
SELECT *, LEAD( log_begin, 1 ) OVER ( PARTITION BY group_id ORDER BY log_begin ) AS next_log_begin
FROM log
but the result isn't correct:
group_id
log_begin
log_end
next_log_begin
1
2022-07-15 15:00:00
2022-07-15 15:30:00
2022-07-15 16:00:00
1
2022-07-15 16:00:00
2022-07-15 16:30:00
2022-07-15 17:00:00
1
2022-07-15 17:00:00
2022-07-15 17:30:00
NULL
2
2022-07-15 15:00:00
2022-07-15 15:20:00
2022-07-15 15:15:00
2
2022-07-15 15:15:00
2022-07-15 15:40:00
2022-07-15 15:30:00
2
2022-07-15 15:30:00
2022-07-15 16:30:00
NULL
Because in row 4 it should get 2022-07-15 15:30:00 instead and row 5 should be NULL.
Correct output:
group_id
log_begin
log_end
next_log_begin
1
2022-07-15 15:00:00
2022-07-15 15:30:00
2022-07-15 16:00:00
1
2022-07-15 16:00:00
2022-07-15 16:30:00
2022-07-15 17:00:00
1
2022-07-15 17:00:00
2022-07-15 17:30:00
NULL
2
2022-07-15 15:00:00
2022-07-15 15:20:00
2022-07-15 15:30:00
2
2022-07-15 15:15:00
2022-07-15 15:40:00
NULL
2
2022-07-15 15:30:00
2022-07-15 16:30:00
NULL
Is there any way to do that using Postgres 10?
Window function are preferable but not a required resource
The data and the results you expect to see don't appear to line up with the logic you've outlined, but I think I get what you are saying.
If I understand you correctly, you want to look at the "next log begin" for every record, sorted by group then log start. If this is the case, you want to omit the "partition by" because it will yield a null any time the group id changes. It executes the lead within groups of whatever value(s) you specify in partition by, in this case group_id. So, for starters:
select
group_id, log_begin, log_end,
lead (log_begin) over (order by group_id, log_begin) as x
from log
Which looks for the next record, independent of changes to the group.
There is no way I'm aware of to evaluate the result of a window function within the expression that invokes it, so to do this you essentially would need to wrap it in a CTE and then evaluate it:
with cte as (
select
group_id, log_begin, log_end,
lead (log_begin) over (order by group_id, log_begin) as x
from log
)
select
group_id, log_begin, log_end,
x
from cte
And now you can compare x to any other field. I think the new field you want would look like this:
case
when log_end < x then x
end as next_log_begin
But again, it does not match your desired results. So either I misunderstood, your sample data might be off, or your assumptions might be off. All are equally possible.
Full query example:
with cte as (
select
group_id, log_begin, log_end,
lead (log_begin) over (order by group_id, log_begin) as x
from log
)
select
group_id, log_begin, log_end,
x,
case
when log_end < x then x
end as next_log_begin
from cte
-- EDIT 7/18/2022 --
I think I see now based on your revised question. I can't promise this will be efficient, but if you implement a scalar I think it will do what you think. Try this and let me know.
select
group_id, log_begin, log_end,
(select min (log_begin)
from log l2
where l1.group_id = l2.group_id
and l2.log_begin > l1.log_end) as next_log_begin
from log l1
order by group_id, log_begin
Related
I have a table like below (tablename: sales)
sales_datetime
sales
salesman
2022-08-01 09:00:00
100
John
2022-08-01 11:00:00
200
John
2022-08-02 10:00:00
100
Peter
2022-08-02 13:00:00
300
John
2022-08-04 14:00:00
300
Peter
2022-08-05 12:00:00
100
John
2022-08-05 16:00:00
200
John
From that table I want to make a summary sales for 5 days period for each salesman. So the summary table that I want is look like this
periode
total_sales
salesman
2022-08-01
300
John
2022-08-01
0
Peter
2022-08-02
300
John
2022-08-02
100
Peter
2022-08-03
0
John
2022-08-03
0
Peter
2022-08-04
0
John
2022-08-04
300
Peter
2022-08-05
300
John
2022-08-05
0
Peter
I have created following query (PSQL) but the results were not same as I want. Assume today is 2022-08-05
with dateseries as
(select generate_series(current_date-'4 days'::interval,
current_date::date,
'1 day'::interval)::date as periode)
select d.periode,coalesce(sum(s.sales),0) as total_sales,s.salesman from dateseries d
left outer join sales s
on d.periode=s.sales_datetime::date
group by d.periode, s.salesman order by d.periode
results:
periode
total_sales
salesman
2022-08-01
300
John
2022-08-02
300
John
2022-08-02
100
Peter
2022-08-03
0
(NULL)
2022-08-04
300
Peter
2022-08-05
300
John
Any advices would be so great. Thank you
Step by step first aggregate the daily sales per salesperson (aggregated_sales CTE), create a list of days to report (days CTE), create a list of salesmen (salesmen CTE) and then query the sales for each day/salesman pair.
with aggregated_sales as
(
select sales_datetime::date sales_date, sum(sales) sales, salesman
from sales group by sales_datetime::date, salesman
),
days(sales_date) as
(
select d::date
from generate_series('2022-08-01', '2022-08-08', interval '1 day') d
),
salesmen (salesman) as
(
select distinct salesman from sales
)
select sales_date, coalesce(sales, 0) sales, salesman
from (select * from days cross join salesmen) fl
left outer join aggregated_sales ags using (sales_date, salesman);
The query may be shorter if CTEs are inlined yet I think that clarity and readability are more important than mere size.
In order to "make a summary sales for 5 days period for each salesman" replace generate_series('2022-08-01', '2022-08-08', interval '1 day') with generate_series(current_date - 4, current_date, interval '1 day').
the results were not same as I want. Assume today is 2022-08-05
Please note that '2022-08-05'::date - '5 days'::interval will give you 2022-07-31, and not 2022-08-01 as you assume. Because of that, I think you meant it to be current_date - '4 days'::interval.
With that out of the way, here is one possible query:
with sales_by_date as (
select
salesman,
sales_datetime::date,
sum(sales) total_sales
from sales
where
-- assuming you need to have totals for salesmen that had sales in specified period only
sales_datetime::date between current_date-'4 days'::interval and current_date
group by
salesman,
sales_datetime::date),
dateseries as (
select
distinct salesman,
generate_series(current_date-'4 days'::interval, current_date, '1 day'::interval)::date as periode
from sales_by_date)
select
d.periode,
coalesce(s.total_sales, 0) total_sales,
d.salesman
from dateseries d
left join sales_by_date s
on d.periode = s.sales_datetime
and d.salesman = s.salesman
order by d.periode, d.salesman;
But you still have to figure out some requirements for this problem. E.g. what if for the specified period there are no sales at all in the sales table?
I hope someone can help me on this one. :-)
I wish to count coherent periods of holiday to see if anyone had coherent holiday more than three days in a row. In other words it is not enough to count the number of days overall. The days have to be coherent. In the example of my data below I have illustrated three people with each their own days of holiday. Person 1234 has two periods of two days of holiday in a row, so this person has no periods above three days since there is a day in between two periods (the 3rd). Person 1235 and 1236 each have one period above three days. Time of day in the timestamps has no relevance, so data can be formatted as just date.
What I have:
ID
Start
1234
2022-01-01 00:00:00
1234
2022-01-02 00:00:00
1234
2022-01-04 06:50:00
1234
2022-01-05 06:50:00
1235
2022-01-04 06:50:00
1235
2022-01-05 06:50:00
1235
2022-01-06 00:00:00
1236
2022-01-01 00:00:00
1236
2022-01-02 00:00:00
1236
2022-01-03 06:50:00
1236
2022-01-04 06:50:00
1236
2022-01-05 06:50:00
1236
2022-01-08 00:00:00
What I hope to get:
ID
N holidays > 3 days
1234
0
1235
1
1236
1
Anyways, any help will be appreciated!
Kind regards,
Jacob
This is a "gaps and islands" problem. You need to first group the data into "islands", which in your case is groups of consecutive holidays. Then summarize them in your final result set
Side note: your question requests greater than 3 days, but your expected output uses greater than or equal to 3 so I used that instead.
DROP TABLE IF EXISTS #Holiday;
DROP TABLE IF EXISTS #ConsecutiveHoliday
CREATE TABLE #Holiday (ID INT,StartDateTime DATETIME)
INSERT INTO #Holiday
VALUES (1234,'2022-01-01 00:00:00')
,(1234,'2022-01-02 00:00:00')
,(1234,'2022-01-04 06:50:00')
,(1234,'2022-01-05 06:50:00')
,(1235,'2022-01-04 06:50:00')
,(1235,'2022-01-05 06:50:00')
,(1235,'2022-01-06 00:00:00')
,(1236,'2022-01-01 00:00:00')
,(1236,'2022-01-02 00:00:00')
,(1236,'2022-01-03 06:50:00')
,(1236,'2022-01-04 06:50:00')
,(1236,'2022-01-05 06:50:00')
,(1236,'2022-01-08 00:00:00');
WITH cte_Previous AS (
SELECT A.ID,B.StartDate
,IsHolidayConsecutive = CASE WHEN DATEADD(day,-1,StartDate) /*Current day minus 1*/ = LAG(StartDate) OVER (PARTITION BY ID ORDER BY StartDate) /*Previous holiday date*/
THEN 0
ELSE 1
END
FROM #Holiday AS A
CROSS APPLY (SELECT StartDate = CAST(StartDateTime AS DATE)) AS B
),
cte_Groups AS (
SELECT *,GroupID = SUM(IsHolidayConsecutive) OVER (PARTITION BY ID ORDER BY StartDate)
FROM cte_Previous
)
/*Groups of holidays taken consecutively*/
SELECT ID
,StartDate = MIN(StartDate)
,EndDate = MAX(StartDate)
,NumOfDays = COUNT(*)
INTO #ConsecutiveHoliday
FROM cte_Groups
GROUP BY ID,GroupID
ORDER BY ID,StartDate
/*See list of consecutive holidays taken*/
SELECT *
FROM #ConsecutiveHoliday
/*Formatted result*/
SELECT ID
,[N holidays >= 3 days] = COUNT(CASE WHEN NumOfDays >= 3 THEN 1 END)
FROM #ConsecutiveHoliday
GROUP BY ID
I have a table that is essentially a purchases table that has purchase prices. When a purchase is made, it is recorded at an hour. Like in the table, ABC-123 was purchased on 2022-1-20 at 12:00. I want the NULL values to show 20 as long as a new purchase price is not punched in. Same for the other id_code.
id_code
hour
purchase_price
ABC-123
2022-1-20 12:00
20
ABC-123
2022-1-20 13:00
NULL
ABC-123
2022-1-20 14:00
NULL
BCD-123
2022-1-20 12:00
35
BCD-123
2022-1-20 13:00
36
BCD-123
2022-1-20 14:00
NULL
The output table will look like this:
It will replace the NULLs with the previously available price for its particular id_code.
id_code
hour
purchase_price
ABC-123
2022-1-20 12:00
20
ABC-123
2022-1-20 13:00
20
ABC-123
2022-1-20 14:00
20
BCD-123
2022-1-20 12:00
35
BCD-123
2022-1-20 13:00
36
BCD-123
2022-1-20 14:00
36
I did find a similar question here but that seems to not work because my IDs are not incremental integers I think.
You can create a view with an aggregate function. Try this :
CREATE VIEW test_view AS
( SELECT id_code
, hour
, (array_agg(purchase_price) FILTER (WHERE purchase_price IS NOT NULL) OVER (PARTITION BY id_code ORDER BY hour DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING))[1]
FROM test
)
Result :
id_code hour array_agg
ABC-123 2022-01-20 12:00:00 20
ABC-123 2022-01-20 13:00:00 20
ABC-123 2022-01-20 14:00:00 20
BCD-123 2022-01-20 12:00:00 35
BCD-123 2022-01-20 13:00:00 36
BCD-123 2022-01-20 14:00:00 36
see the demo in dbfiddle.
I am using below query for fetching the record month wise but it give
wrong data
SELECT
(count( server_time::timestamp::date)) ,
min(server_time::timestamp::date) as "Month Date"
FROM
complaint_details_v2
WHERE
server_time between '2018/08/01' and '2018/10/30'
GROUP BY
floor((server_time::timestamp::date - '2018/08/01'::date)/30)
ORDER BY
2 ASC
Result
Count Month Date
2774 2018-08-01
5893 2018-08-31
1193 2018-09-30
But result will be
Count Month Date
2774 2018-08-01
5893 2018-09-01
1193 2018-10-01
Use date_trunc
demo:db<>fiddle
SELECT
count(*),
date_trunc('month', servertime)::date as month_date
FROM log
GROUP BY date_trunc('month', servertime)
ORDER BY 2
Please provide hive query to return last date of each month in 'yyyy-mm-dd' format for 3 years.
Substitute start and end dates in this example with yours. How it works: space function generates string of spaces with length = number of days returned by datediff() function, split by space creates an array, posexplode explodes an array, returning position of the element in the array, which corresponds to the number of days. Then date_add('${hivevar:start_date}',s.i) returns dates for each day, lest_day() function (exists in Hive since 1.1version) converts each date to the last day (need distinct here). Run this example:
set hivevar:start_date=2015-07-01;
set hivevar:end_date=current_date;
select distinct last_day(date_add ('${hivevar:start_date}',s.i)) as last_date
from ( select posexplode(split(space(datediff(${hivevar:end_date},'${hivevar:start_date}')),' ')) as (i,x)
) s
order by last_date
;
Output:
OK
2015-07-31
2015-08-31
2015-09-30
2015-10-31
2015-11-30
2015-12-31
2016-01-31
2016-02-29
2016-03-31
2016-04-30
2016-05-31
2016-06-30
2016-07-31
2016-08-31
2016-09-30
2016-10-31
2016-11-30
2016-12-31
2017-01-31
2017-02-28
2017-03-31
2017-04-30
2017-05-31
2017-06-30
2017-07-31
2017-08-31
2017-09-30
2017-10-31
2017-11-30
2017-12-31
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31
2018-06-30
2018-07-31
Time taken: 71.581 seconds, Fetched: 37 row(s)