Consecutive records with a condition in PostgreSQL

Consecutive records with a condition in PostgreSQL - postgresql

Create table weather_forecast (
date date,
temperature decimal,
avg_humidity decimal,
avg_dewpoint decimal,
avg_barometer decimal,
avg_windspeed decimal,
avg_gutspeed decimal,
avg_direction decimal
, rainfall_month decimal
, rainfall_year decimal
, maxrain_permin decimal
, max_temp decimal
, min_temp decimal
, max_humidity decimal,
min_humidity decimal
, max_pressure decimal
, min_pressure decimal
, max_winspeed decimal
, max_gutspeed decimal
, maxheat_index decimal
, month int
, diff_pressure decimal(7,5)
);
If the maximum gust speed increases from 55mph, I need to fetch the details for the next 4 days. I am able to go this far only:
with t1 as(
select max_gutspeed, date
from weather_forecast
where max_gutspeed >55 )
, t2 as (
select date, row_number() over () as rn
from weather_forecast)
select distinct(rn), t1.date, t1.max_gutspeed
from t1
inner join t2 on t1.date = t2.date
order by t2.rn asc`
Also, how do I find the maximum and minimum number of days when temperature dropped?
I tried doing it with the temp tables but i am stuck and don't know the next steps.

You can construct a daterange out of your date and an interval of 4 days after that. Then list everything that's on a "date" contained <# in ANY of those ranges. online demo
select *
from weather_forecast t
where date <# ANY(
select daterange(date, (date+'4 days'::interval)::date, '[]')
from weather_forecast
where max_gutspeed >55 )
order by t.rn asc;

Related

How to sum for previous n number of days for a number of dates in PostgreSQL

I have a list of dates each with a value in Postgresql.
For each date I want to sum the value for this date and the previous 4 days.
I also want to sum the values for the start of that month to the present date. So for example:
For 07/02/2021 sum all values from 07/02/2021 to 01/02/2021
For 06/02/2021 sum all values from 06/02/2021 to 01/02/2021
For 31/01/2021 sum all values from 31/01/2021 to 01/01/2021
The output should look like, will be created as two separate tables:
Output
Any help would be appreciated.
Thanks

Sample data and structure: dbfiddle
For first part of query:
select date,
value,
sum(value) over (
order by to_date(date, 'DD/MM/YYYY')
rows between 4 preceding and current row) as five_day_period
from your_table_name
order by to_date(date, 'DD/MM/YYYY') desc;
For second part of query:
select date,
value,
sum(value)
over (
partition by regexp_replace(date, '[0-9]{2}/(.+)', '\1')
order by to_date(date, 'DD/MM/YYYY')
rows between unbounded preceding and current row) as month_to_date
from your_table_name
order by to_date(date, 'DD/MM/YYYY') desc;

Generating series Postgres

I want to be able to generate groups of row by days, weeks, month or depending on the interval I set
Following this solution, it works when granularity is by month. But trying the interval of 1 week, no records are being returned.
This is the rows on my table
This is the current query I have for per month interval, which works perfectly.
SELECT *
FROM (
SELECT day::date
FROM generate_series(timestamp '2018-09-01'
, timestamp '2018-12-01'
, interval '1 month') day
) d
LEFT JOIN (
SELECT date_trunc('month', created_date)::date AS day
, SUM(escrow_amount) AS profit, sum(total_amount) as revenue
FROM (
select distinct on (order_id) order_id, escrow_amount, total_amount, create_time from order_item
WHERE created_date >= date '2018-09-01'
AND created_date <= date '2018-12-01'
-- AND ... more conditions
) t2 GROUP BY 1
) t USING (day)
ORDER BY day;
Result from this query
And this is the per week interval query. I will reduce the range to two months for brevity.
SELECT *
FROM (
SELECT day::date
FROM generate_series(timestamp '2018-09-01'
, timestamp '2018-11-01'
, interval '1 week') day
) d
LEFT JOIN (
SELECT date_trunc('week', created_date)::date AS day
, SUM(escrow_amount) AS profit, sum(total_amount) as revenue
FROM (
select distinct on (order_id) order_id, escrow_amount, total_amount, create_time from order_item
WHERE created_date >= date '2018-09-01'
AND created_date <= date '2018-11-01'
-- AND ... more conditions
) t2 GROUP BY 1
) t USING (day)
ORDER BY day;
Take note that I have records from October, but the result here doesn't show anything for October dates.
Any idea what I am missing here?

Results from your first query are not truncated to the begin of the week.
date_trunc('2018-09-01'::date, 'week')::date
is equal to
'2018-08-27'::date
so your join using day is not working
'2018-09-01'::date <> '2018-08-27'::date
Your query should look more like that:
SELECT *
FROM (
SELECT day::date
FROM generate_series(date_trunc('week',timestamp '2018-09-01') --series begin trunc
, timestamp '2018-11-01'
, interval '1 week') day
) d
LEFT JOIN (
SELECT date_trunc('week', created_date::date)::date AS day
, SUM(escrow_amount) AS profit, sum(total_amount) as revenue
FROM (
select distinct on (order_id) order_id, escrow_amount, total_amount, create_time from order_item
WHERE created_date::date >= date '2018-09-01'
AND created_date::date <= date '2018-11-01'
-- AND ... more conditions
) t2 GROUP BY 1
) t USING (day)
WHERE day >= '2018-09-01' --to skip days from begining of the week to the begining of the series before trunc
ORDER BY day;

How does this Time Difference Calculation work?

I wanted to display the difference in HH:MM:SS between two datetime fields in SQL Server 2014.
I found a solution in this Stack Overflow post. And it works perfectly. But I want to understand the "why" of how this arrives at the correct answer.
T-SQL:
SELECT y.CustomerID ,
y.createDate ,
y.HarvestDate ,
y.DateDif ,
DATEDIFF ( DAY, 0, y.DateDif ) AS [Days] ,
DATEPART ( HOUR, y.DateDif ) AS [Hours] ,
DATEPART ( MINUTE, y.DateDif ) AS [Minutes]
FROM (
SELECT x.createDate - x.HarvestDate AS [DateDif] ,
x.createDate ,
x.HarvestDate ,
x.CustomerID
FROM (
SELECT CustomerID ,
HarvestDate ,
createDate
FROM dbo.CustomerHarvestReports
WHERE HarvestDate >= DATEADD ( MONTH, -6, GETDATE ())
) AS [x]
) AS [y]
ORDER BY DATEDIFF ( DAY, 0, y.DateDif ) DESC;
Results:
1239090 2017-11-07 08:51:03.870 2017-10-14 11:39:49.540 1900-01-24 21:11:14.330 23 21 11
1239090 2017-11-07 08:51:04.823 2017-10-19 11:17:48.320 1900-01-19 21:33:16.503 18 21 33
1843212 2017-10-27 19:14:02.070 2017-10-21 10:49:57.733 1900-01-07 08:24:04.337 6 8 24
1843212 2017-10-27 19:14:03.057 2017-10-21 10:49:57.733 1900-01-07 08:24:05.323 6 8 24
The first column in Customer ID - the second and third columns are the columns I wanted to calculate the time difference between. The third column is the difference between the two columns - and one of the points in the code in which I do not understand.
If you subtract two datetime fields like this create date - harvestdate, why does it default to the year 1900?
And regarding DATEDIFF ( DAY, 0 , y.DateDiff) - what does the 0 mean? Does the 0 set the date as '01-01-1900'?
It works - for that I am grateful. I was hoping I could get an explanation as to why this behavior works?

I've added some comments that should explain it:
SELECT y.CustomerID ,
y.createDate ,
y.HarvestDate ,
y.DateDif ,
DATEDIFF ( DAY, 0, y.DateDif ) AS [Days] , -- calculates the number of whole days between 0 and the difference
DATEPART ( HOUR, y.DateDif ) AS [Hours] , -- the number of hours between the two dates has already been cleverly
-- calculated in [DateDif], therefore, all that is required is to extract
-- that figure using DATEPART
DATEPART ( MINUTE, y.DateDif ) AS [Minutes] -- same explanation as [Hours]
FROM (
SELECT x.createDate - x.HarvestDate AS [DateDif] , -- calculates the difference expressed as a datetime;
-- 0 is '1900-01-01 00:00:00.000' as a datetime, so the
-- resulting datetime will be that plus the difference
x.createDate ,
x.HarvestDate ,
x.CustomerID
FROM (
SELECT CustomerID ,
HarvestDate ,
createDate
FROM dbo.CustomerHarvestReports
WHERE HarvestDate >= DATEADD ( MONTH, -6, GETDATE ())
) AS [x]
) AS [y]
ORDER BY DATEDIFF ( DAY, 0, y.DateDif ) DESC;

Rolling sum per time interval per group

Table, data and task as follows.
See SQL-Fiddle-Link for demo-data and estimated results.
create table "data"
(
"item" int
, "timestamp" date
, "balance" float
, "rollingSum" float
)
insert into "data" ( "item", "timestamp", "balance", "rollingSum" ) values
( 1, '2014-02-10', -10, -10 )
, ( 1, '2014-02-15', 5, -5 )
, ( 1, '2014-02-20', 2, -3 )
, ( 1, '2014-02-25', 13, 10 )
, ( 2, '2014-02-13', 15, 15 )
, ( 2, '2014-02-16', 15, 30 )
, ( 2, '2014-03-01', 15, 45 )
I need to get all rows in an defined time interval. The above table doesn't hold a record per item for each possible date - only dates on which changes applied are recorded ( it is possible that there are n rows per timestamp per item )
If the given interval does not fit exactly on stored timestamps, the latest timestamp before startdate ( nearest smallest neighbour ) should be used as start-balance/rolling-sum.
estimated results ( time interval: startdate = '2014-02-13', enddate = '2014-02-20' )
"item", "timestamp" , "balance", "rollingSum"
1 , '2014-02-13' , -10 , -10
1 , '2014-02-15' , 5 , -5
1 , '2014-02-20' , 2 , -3
2 , '2014-02-13' , 15 , 15
2 , '2014-02-16' , 15 , 30
I checked questions like this and googled a lot, but didn't found a solution yet.
I don't think it's a good idea to extend "data" table with one row per missing date per item, thus the complete interval ( smallest date <-----> latest date per item may expand over several years ).
Thanks in advance!

select sum(balance)
from table
where timestamp >= (select max(timestamp) from table where timestamp <= 'startdate')
and timestamp <= 'enddate'
Don't know what you mean by rolling-sum.

here is an attempt. Seems it gives the right result, not so beautiful. Would have been easier in sqlserver 2012+:
declare #from date = '2014-02-13'
declare #to date = '2014-02-20'
;with x as
(
select
item, timestamp, balance, row_number() over (partition by item order by timestamp, balance) rn
from (select item, timestamp, balance from data
union all
select distinct item, #from, null from data) z
where timestamp <= #to
)
, y as
(
select item,
timestamp,
coalesce(balance, rollingsum) balance ,
a.rollingsum,
rn
from x d
cross apply
(select sum(balance) rollingsum from x where rn <= d.rn and d.item = item) a
where timestamp between '2014-02-13' and '2014-02-20'
)
select item, timestamp, balance, rollingsum from y
where rollingsum is not null
order by item, rn, timestamp
Result:
item timestamp balance rollingsum
1 2014-02-13 -10,00 -10,00
1 2014-02-15 5,00 -5,00
1 2014-02-20 2,00 -3,00
2 2014-02-13 15,00 15,00
2 2014-02-16 15,00 30,00

T-SQL duration in hours:minutes:seconds

I have average duration between several dates (DATETIME format) ie. 1900-01-01 01:30.00.00.
How can I convert DATETIME to format hours:minutes:seconds where hours can be more that 24 - output format can be VARCHAR.
IE.
1 days 12 hours 5 minutes convert to 36:05:00
2 days 1 hour 10 minutes 5 seconds convert to 49:10:05
etc...
DECLARE #date1 DATETIME = '2011-08-03 13:30'
DECLARE #date2 DATETIME = '2011-08-03 13:00'
DECLARE #date3 DATETIME = '2011-08-03 14:00'
DECLARE #abc DATETIME = '2011-08-03 12:00'
select CAST(AVG(CAST(data-#abc as float)) as datetime)
from
(
select 'data' as label, #date1 as data
union all
select 'data' as label, #date2 as data
union all
select 'data' as label, #date3 as data
) as a
group by label
I would like to get result as 01:30:00 which means that average time is 1 hours and 30 minutes.
I tried it:
CONVERT(VARCHAR(10), CAST(AVG(CAST(data-#abc as float)) as datetime), 108)
but then I get only time portion in HH:MM:SS. When I set #abc = 2011-08-02 then the results will be the same - this is incorrect.
King regards,
Marcin

In T-SQL a datetime is precisely that, a date and a time where the hours can never exceed 24 because that moves it to the next day. You could use datepart to piece the datetime values out and treat them as integers and then rejoin them into the string you want. Depending on your final goal, you may be better of doing this type of work in your application or presentation layers where more general purpose languages often have more robust datetime libraries to work with.

I think you need to write a scalar-valued function that takes an integer argument (time difference in seconds) and format it as needed. For example,
CREATE FUNCTION intToDateTime ( #time_in_secs BIGINT) RETURNS VARCHAR(30)
AS
BEGIN
DECLARE #retval VARCHAR(30);
SET #retval = cast(#time_in_secs/(60*60) as varchar(10))+':'+
cast( (#time_in_secs-#time_in_secs/(60*60)*3600)/60 as varchar(10))+':'+
cast( (#time_in_secs-#time_in_secs/(60)*60) as varchar(10));
return #retval;
END
This function needs some changes - you may want to display leading zero for 0-9(i.g. '00' instead of '0' as this function currently does); also you need to handle negative values in a better way.
Now you can use it with DATEDIFF(second, #val1,#val2).
Hope I pointed you to the right direction.

select cast(cast(cast(t as float) *24 as int) as varchar) + right(convert(varchar,t, 20), 6)
from(
select cast(AVG(CAST(data-#abc as float)) as datetime) t
from
(
select 'data' as label, #date1 as data
union all
select 'data' as label, #date2 as data
union all
select 'data' as label, #date3 as data
) as a
group by label
) a
Result:
1:30:00

You can't convert datetime to handle non-real dates and times.
However, you can get an output that looks like a datetime, simply by concatenating an hours string with ':' with minutes, etc.
Lookup the DATEADD() and DATEDIFF() functions...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Consecutive records with a condition in PostgreSQL - postgresql

Related

How to sum for previous n number of days for a number of dates in PostgreSQL

Generating series Postgres

How does this Time Difference Calculation work?

Rolling sum per time interval per group

T-SQL duration in hours:minutes:seconds

Categories

Resources