Within postgresql, I'm trying to write a query that calculates the time difference between a time stamp of the first row and a time stamp of the last row:
(select public."ImportLogs"."DateTimeStamp" as migration_start from public."ImportLogs" order by public."ImportLogs"."DateTimeStamp" asc limit 1)
union
(select public."ImportLogs"."DateTimeStamp" as migration_end from public."ImportLogs" order by public."ImportLogs"."DateTimeStamp" desc limit 1);
I tried to get the time difference between migration_start and migration_end, but I couldn't get it to work. How can I achieve this?
We can substract min(DateTimeStamp) from the max(DateTimeStamp)` and cast the difference as time.
select
cast(
max(DateTimeStamp)
- min(DateTimeStamp)
as time) TimeDiffernce
from ImportLogs
| timediffernce |
| :------------ |
| 00:00:10 |
db<>fiddle here
Related
I have a table on my postgres database, which has two main fields: agent_id and quoted_at.
I need to group my data by agent_id, and calculate the mean difference among all quoted_at.
So, for example, if I have the following rows:
agent_id | quoted_at
---------+-----------
1 | 2020-04-02
1 | 2020-04-04
1 | 2020-04-05
The mean difference would be calculated as:
( (2020-04-05 - 2020-04-04) + (2020-04-04 - 2020-04-02) ) / 2 = 1.5 days
What I want to see after grouping the info is:
agent_id | mean
---------+---------
1 | 1.5 days
I know, by the end, I just need to calculate (last - first) / (#_occurrences - 1)
It is just not really clear how (and if) it is possible to do that using a single query on Postgres.
Use the lag() window function to calculate your differences. Once you have those differences, use the avg() aggregation function.
with diffs as (
select agent_id, quoted_at,
quoted_at - lag(quoted_at) over (partition by agent_id
order by quoted_at) as diff_days
from your_table
)
select agent_id, avg(diff_days) as mean
from diffs
where diff_days is not null;
The check for null diff_days is necessary since the diff_days for the first record for an agent is null, and you do not want that in the avg() aggregation.
The purpose of this question is to optimize some SQL by using set-based operations vs iterative (looping, like I'm doing below):
Some Explanation -
I have this cte that is inserted to a temp table #dataForPeak. Each row represents a minute, and a respective value retrieved.
For every row, my code uses a while loop to add 15 rows at a time (the current row + the next 14 rows). These sums are inserted into another temp table #PeakDemandIntervals, which is my workaround for then finding the max sum of these groups of 15.
I've bolded my end goal above. My code achieves this but in about 12 seconds for 26k rows. I'll be looking at much more data, so I know this is not enough for my use case.
My question is,
can anyone help me find a fast alternative to this loop?
It can include more tables, CTEs, nested queries, whatever. The while loop might not even be the issue, it's probably the inner code.
insert into #dataForPeak
select timestamp, value
from cte
order by timestamp;
while ##ROWCOUNT<>0
begin
declare #timestamp datetime = (select top 1 timestamp from #dataForPeak);
insert into #PeakDemandIntervals
select #timestamp, sum(interval.value) as peak
from (select * from #dataForPeak base
where base.timestamp >= #timestamp
and base.timestamp < DATEADD(minute,14,#timestamp)
) interval;
delete from #dataForPeak where timestamp = #timestamp;
end
select max(peak)
from #PeakDemandIntervals;
Edit
Here's an example of my goal, using groups of 3min instead of 15min.
Given the data:
Time | Value
1:50 | 2
1:51 | 4
1:52 | 6
1:53 | 8
1:54 | 6
1:55 | 4
1:56 | 2
the max sum (peak) I'm looking for is 20, because the group
1:52 | 6
1:53 | 8
1:54 | 6
has the highest sum.
Let me know if I need to clarify more than that.
Based on the example given it seems like you are trying to get the maximum value of a rolling sum. You can calculate the 15-minute rolling sum very easily as follow:
SELECT [Time]
,[Value]
,SUM([Value]) OVER (ORDER BY [Time] ASC ROWS 14 PRECEDING) [RollingSum]
FROM #dataForPeak
Note the key here is the ROWS 14 PRECEDING statement. It effectively state that SQL Server should sum the preceding 14 records with the current record which will give you your 15 minute interval.
Now you can simply max the result of the rolling sum. The full query will look as follow:
;WITH CTE_RollingSum
AS
(
SELECT [Time]
,[Value]
,SUM([Value]) OVER (ORDER BY [Time] ASC ROWS 14 PRECEDING) [RollingSum]
FROM #dataForPeak
)
SELECT MAX([RollingSum]) AS Peak
FROM CTE_RollingSum
I have a table with epoch values (one per minute, the epoch itself is in milliseconds) and temperatures.
select * from outdoor_temperature order by time desc;
time | value
---------------+-------
1423385340000 | 31.6
1423385280000 | 31.6
1423385220000 | 31.7
1423385160000 | 31.7
1423385100000 | 31.7
1423385040000 | 31.8
1423384980000 | 31.8
1423384920000 | 31.8
1423384860000 | 31.8
[...]
I want to get the highest single value in a given day, which I'm doing like this:
SELECT *
FROM
outdoor_temperature
WHERE
value = (
SELECT max(value)
FROM outdoor_temperature
WHERE
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney'
)
AND
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney'
ORDER BY time DESC LIMIT 1;
On my Linode, running CentOS 5 and Postgres 8.4, it returns perfectly (I get a single value, within that date, with the maximum temperature). On my MacBook Pro with Postgres 9.3.5, however, the exact same query against the exact same data doesn't return anything. I started simplifying everything to work out what was going wrong, and got to here:
SELECT max(value)
FROM outdoor_temperature
WHERE
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney';
max
-----
(1 row)
It's empty, and yet returning one row?!
My questions are:
Firstly, why is that query working against Postgres 8.4 and doing something different on 9.3.5?
Secondly, is there a much simpler way to achieve what I'm trying to do? I feel like there should be but if so I've not managed to work it out. This ultimately needs to work on Postgres 8.4.
I'm not really sure why you're getting no results - you seem to simply miss data for this day.
But you really should use another query for selecting a date, as your query would not be able to use an index.
You should select like this:
select max(value) from outdoor_temperature where
time>=extract(
epoch from
'2015-02-05'::timestamp at time zone 'Australia/Sydney'
)
and
time<extract(
epoch from
('2015-02-05'::timestamp+'1 day'::interval) at time zone 'Australia/Sydney'
)
;
This is much simpler and this way your database would be able to use an index on time, which should be a primary key (with automatic index).
In classification_indicator_id column I have some digits. I would like to sum this digits in 1 day series. I wrote below query
select
a.data_start::date,
a.segment1::integer as "Segment1"
from (
select
data as data_start,
(select sum(classification_indicator_id) from classifications where classification_indicator_id = 3)::integer as segment1
from
generate_series('2013-03-25'::timestamp without time zone,
'2013-04-01'::timestamp without time zone,
'1 day'::interval) data
) a
group by
a.data_start,
a.segment1
ORDER BY data_start
But I always getting something like:
date start|segment1
-------------------
2013-03-25|39
2013-03-26|39
2013-03-27|39
2013-03-28|39
2013-03-29|39
2013-03-30|39
2013-03-31|39
2013-04-01|39
I am sure that should be something like:
date start|segment1
-------------------
2013-03-25|3
2013-03-26|4
2013-03-27|7
2013-03-28|9
2013-03-29|15
2013-03-30|22
2013-03-31|19
2013-04-01|5
SQL Fiddle
select
data.d::date,
coalesce(sum(classification_indicator_id), 0)::integer as "Segment1"
from
classifications c
right join
generate_series(
'2013-03-25'::timestamp without time zone,
'2013-04-01'::timestamp without time zone,
'1 day'::interval
) data(d) on data.d::date = c.data_start::date
where classification_indicator_id = 3
group by 1
ORDER BY 1
I need to add some other columns (with another classification_indicator_id
I modified a bit your answer:
select
data.d::date as "data",
sum(c.classification_indicator_id)::integer as "Segment1",
sum(c4.classification_indicator_id)::integer as "Segment2",
sum(c5.classification_indicator_id)::integer as "Segment3"
from
generate_series(
'2013-03-25'::timestamp without time zone,
'2013-04-01'::timestamp without time zone,
'1 day'::interval
) data(d)
left join classifications c on (data.d::date = c.created::date and c.classification_indicator_id = 3)
left join classifications c4 on (data.d::date = c4.created::date and c4.classification_indicator_id = 4)
left join classifications c5 on (data.d::date = c5.created::date and c5.classification_indicator_id = 5)
group by "data"
ORDER BY "data"
But still not working properly. sum for each row is to big, and growing when I add additional columns
With 4 columns With 3 column
data |Segment1|Segment2|Segment3 data |Segment1|Segment2
------------------------------------- ----------------------------
2013-03-25|12 |16 |20 2013-03-25|12 |16
------------------------------------- ----------------------------
2013-03-26|108 |144 |180 2013-03-26|18 |24
This query works great for finding the difference between successive rows:
select id, created_at, created_at - lag(created_at, 1)
over (order by created_at) as diff
from fistbumps where bumper_id = 2543
and created_at between '2012-01-11' and '2012-01-12' order by created_at;
...but the results come out like:
id | created_at | diff
--------+----------------------------+-----------------
197230 | 2012-01-11 00:04:31.774426 |
197231 | 2012-01-11 00:04:32.279181 | 00:00:00.504755
197232 | 2012-01-11 00:04:33.961665 | 00:00:01.682484
197233 | 2012-01-11 00:04:36.506685 | 00:00:02.54502
What would be really groovy is if I could format that diff column to just seconds and millis (e.g. 2.54502). I tried using date_trunc() and extract(), but I can't seem to get the syntax right.
The result of created_at - lag(create_at) is a value of type interval.
You can get the seconds of an interval using extract(epoch from interval_value)
So in your case it would be:
extract(epoch from (created_at - lag(created_at, 1)) )