max(value) returning 1 empty row on Postgres 9.3.5, works on 8.4 - postgresql

I have a table with epoch values (one per minute, the epoch itself is in milliseconds) and temperatures.
select * from outdoor_temperature order by time desc;
time | value
---------------+-------
1423385340000 | 31.6
1423385280000 | 31.6
1423385220000 | 31.7
1423385160000 | 31.7
1423385100000 | 31.7
1423385040000 | 31.8
1423384980000 | 31.8
1423384920000 | 31.8
1423384860000 | 31.8
[...]
I want to get the highest single value in a given day, which I'm doing like this:
SELECT *
FROM
outdoor_temperature
WHERE
value = (
SELECT max(value)
FROM outdoor_temperature
WHERE
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney'
)
AND
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney'
ORDER BY time DESC LIMIT 1;
On my Linode, running CentOS 5 and Postgres 8.4, it returns perfectly (I get a single value, within that date, with the maximum temperature). On my MacBook Pro with Postgres 9.3.5, however, the exact same query against the exact same data doesn't return anything. I started simplifying everything to work out what was going wrong, and got to here:
SELECT max(value)
FROM outdoor_temperature
WHERE
((timestamp with time zone 'epoch' + (time::float/1000) * interval '1 second') at time zone 'Australia/Sydney')::date
= '2015-02-05' at time zone 'Australia/Sydney';
max
-----
(1 row)
It's empty, and yet returning one row?!
My questions are:
Firstly, why is that query working against Postgres 8.4 and doing something different on 9.3.5?
Secondly, is there a much simpler way to achieve what I'm trying to do? I feel like there should be but if so I've not managed to work it out. This ultimately needs to work on Postgres 8.4.

I'm not really sure why you're getting no results - you seem to simply miss data for this day.
But you really should use another query for selecting a date, as your query would not be able to use an index.
You should select like this:
select max(value) from outdoor_temperature where
time>=extract(
epoch from
'2015-02-05'::timestamp at time zone 'Australia/Sydney'
)
and
time<extract(
epoch from
('2015-02-05'::timestamp+'1 day'::interval) at time zone 'Australia/Sydney'
)
;
This is much simpler and this way your database would be able to use an index on time, which should be a primary key (with automatic index).

Related

Postgresql timestamp difference greater than 1 hour

Hi I have a entrytime and exittime timestamp in my database, how can I query it to display only ones where the person exited more than an hour later;
Select * from store where EXTRACT(EPOCH FROM (exittime - entrytime))/3600 >60
That's what I have so far but it won't work, any help would be appreciated.
Just subtract the values and compare it with an interval
Select *
from store
where exittime - entrytime > interval '1 hour';
This assumes that both columns are defined as timestamptz or timestamp

where with two colums time without zone: greatest

I have two columns one is a time the other a timestamp
ALTER TABLE public.tour
ADD COLUMN reprocess_toupdate timestamp without time zone DEFAULT NOW();
ALTER TABLE public.tour
ADD COLUMN reprocess_updated time without time zone DEFAULT NOW();
when I execute:
select reprocess_toupdate, reprocess_updated
from tour
where reprocess_toupdate::date > reprocess_updated::date;
I get an error:
ERROR: cannot cast type time without time zone to date
without ::date, I get this error:
ERROR: operator does not exist: timestamp without time zone > time without time zone
That is because a TIME column does not have a date component. It's range of values is 00:00:00 - 24:00:00. see Documentation Section 8.5 Date/Time Types. Since it does not have a date component you cannot cast it as date. The proper solution would to change the type to "timestamp without time zone". If that is not possible then compare just the times or to "reattach" the date then compare:
with dateset as
(select '2019-06-02 13:00:00'::time without time zone tm, (now() - interval '1 day')::timestamp without time zone dt)
select tm, dt, date_trunc('day', dt)+tm redt from dateset
Works here:
create temporary table so (id serial primary key, ts timestamp default now());
insert into so (ts) values (now());
select * from so where ts::date < now();
Output:
+------+----------------------------+
| id | ts |
|------+----------------------------|
| 1 | 2019-07-01 10:16:43.093662 |
+------+----------------------------+

Postgres search available time slots with generate_series

I have a table in my postgres database which has a column of dates. I want to search which of those dates is missing - for example:
date
2016-11-09 18:30:00
2016-11-09 19:00:00
2016-11-09 20:15:00
2016-11-09 22:20:00
2016-11-09 23:00:00
Here, |2016-11-09 21:00:00| is missing. After sorting my generated series if my table has an entry between two slots (slot of 1 hr interval) i need to remove that.
I want to make a query with generate_series that returns me the date which is missing. Is this possible?.
sample query that i used to generate series.
SELECT t
FROM generate_series(
TIMESTAMP WITH TIME ZONE '2016-11-09 18:00:00',
TIMESTAMP WITH TIME ZONE '2016-11-09 23:00:00',
INTERVAL '1 hour'
) t
EXCEPT
SELECT tscol
FROM mytable;
But this query is not removing 2016-11-09 18:30:00,2016-11-09 20:15:00 etc. cuz i used except.
This is not a gaps-and-island problem. You just want to find the 1 hour intervals for which no record exist in the table.
EXCEPT does not work here because it does equality comparison, while you want to check if a record exists or not within a range.
A typical solution for this is to use a left join antipattern:
select dt
from generate_series(
timestamp with time zone '2016-11-09 18:00:00',
timestamp with time zone '2016-11-09 23:00:00',
interval '1 hour'
) d(dt)
left join mytable t
on t.tscol >= dt and t.tscol < dt + interval '1 hour'
where t.tscol is null
You can also use not exists:
select dt
from generate_series(
timestamp with time zone '2016-11-09 18:00:00',
timestamp with time zone '2016-11-09 23:00:00',
interval '1 hour'
) d(dt)
where not exists (
select 1
from mytable t
where t.tscol >= dt and t.tscol < dt + interval '1 hour'
)
In this demo on DB Fiddle, both queries return:
| dt |
| :--------------------- |
| 2016-11-09 21:00:00+00 |

date at time zone related syntax and semantic differences

Question: How is query 1 "semantically" different than the query 2?
Background:
To extract data from the table in a db which is at my localtime zone (AT TIME ZONE 'America/New_York').
The table has data for various time zones such as the 'America/Los_Angeles', America/North_Dakota/New_Salem and such time zones.
(Postgres stores the table data for various timezones in my local timezone)
So, everytime I retrieve data for a different location other than my localtime, I convert it to its relevant timezone for evaluation purposes..
Query 1:
test_db=# select count(id) from click_tb where date::date AT TIME ZONE 'America/Los_Angeles' = '2017-05-22'::date AT TIME ZONE 'America/Los_Angeles';
count
-------
1001
(1 row)
Query 2:
test_db=# select count(id) from click_tb where (date AT TIME ZONE 'America/Los_Angeles')::date = '2017-05-22'::date;
count
-------
5
(1 row)
Table structure:
test_db=# /d+ click_tb
Table "public.click_tb"
Column | Type | Modifiers | Storage | Stats target | Description
-----------------------------------+--------------------------+-------------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('click_tb_id_seq'::regclass) | plain | |
date | timestamp with time zone | | plain | |
Indexes:
"click_tb_id" UNIQUE CONSTRAINT, btree (id)
"click_tb_date_index" btree (date)
The query 1 and query 2 do not produce consistent results.
As per my tests, the below query 3, semantically addresses my requirement.
Your critical feedback is welcome.
Query 3:
test_db=# select count(id) from click_tb where ((date AT TIME ZONE 'America/Los_Angeles')::timestamp with time zone)::date = '2017-05-22'::date;
Do not convert the timestamp field. Instead, do a range query. Since your data is already using a timestamp with time zone type, just set the time zone of your query accordingly.
set TimeZone = 'America/Los_Angeles';
select count(id) from click_tb
where date >= '2017-01-02'
and date < '2017-01-03';
Note how this uses a half open interval of the dates (at start of day in the set time zone). If you want to compute the second date from your first date, then:
set TimeZone = 'America/Los_Angeles';
select count(id) from click_tb
where date >= '2017-01-02'
and date < (timestamp with time zone '2017-01-02' + interval '1 day');
This properly handles daylight saving time, and sargability.

Get min/max value per day with epoch times in Postgres 8.4

I have a table with epoch values (one per minute, the epoch itself is in milliseconds) and temperatures.
select * from outdoor_temperature order by time desc;
time | value
---------------+-------
1423385340000 | 31.6
1423385280000 | 31.6
1423385220000 | 31.7
1423385160000 | 31.7
1423385100000 | 31.7
1423385040000 | 31.8
1423384980000 | 31.8
1423384920000 | 31.8
1423384860000 | 31.8
[...]
I want to get the lowest value (and highest, but that can be a separate query) that occurred in each day, and the specific time (preferably the original epoch time) when that occurred. I've managed to do it with date_trunc but that gives me the general day, rather than the specific time within that day:
select
date_trunc('day',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp,
min(value)
from outdoor_temperature
group by timestamp
order by min asc
limit 5;
timestamp | min
------------------------+------
2015-03-27 00:00:00+10 | 10.7
2015-03-28 00:00:00+10 | 10.8
2015-01-30 00:00:00+10 | 13.6
2015-03-17 00:00:00+10 | 14.0
2015-03-29 00:00:00+10 | 14.5
(5 rows)
Is there some sort of join magic I need to do (my join-fu is extremely weak), or am I attacking this from totally the wrong direction? I tried DISTINCT ON but didn't manage to even get that working.
You can start from this query:
SELECT date_trunc('minute',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp, value AS temperature from _outdoor_temperature
which shows two columns, the first is "epoch" converted to the timestamp with "minute" precision.
Since you need to find the lowest/highest value for each day, would be nice to have also column with just a date rather than timestamp:
SELECT
x.timestamp::date AS a,
x.timestamp AS b,
temperature AS c
FROM (
SELECT date_trunc('minute',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp, value AS temperature from _outdoor_temperature
) AS x
So now you have a date as "a" column, a timestamp as "b" column and the temperature value in the last, "c" column.
The last part is to use "order by" in conjunctionw ith "distinct on" expression. This is better than group by, because you're finding unique values of one column and see the associations of another:
select distinct on(y.a)
y.a,
y.b,
y.c
from (
SELECT
x.timestamp::date AS a,
x.timestamp AS b,
temperature AS c
FROM (
SELECT date_trunc('minute',TIMESTAMP WITH TIME ZONE 'epoch' + (time/1000) * INTERVAL '1 second') as timestamp, value AS temperature from _outdoor_temperature
) AS x
) y
order by y.a, y.c
select day::date, min_value_timestamp, min_value, max_value_timestamp, max_value
from
(
select distinct on (1)
date_trunc('day', timestamp with time zone 'epoch' + time/1000 * interval '1 second') as day,
timestamp with time zone 'epoch' + (time/1000 * interval '1 second') as min_value_timestamp,
value as min_value
from outdoor_temperature
order by 1, 3
) s
inner join
(
select distinct on (1)
date_trunc('day', timestamp with time zone 'epoch' + time/1000 * interval '1 second') as day,
timestamp with time zone 'epoch' + (time/1000 * interval '1 second') as max_value_timestamp,
value as max_value
from outdoor_temperature
order by 1, 3 desc
) v using (day)
order by 1
Ok, thanks to #voycheck's suggestion I ended up adding another column of type date and populating that with just the date that corresponds to the time field, so the table looks like this:
Column | Type | Modifiers
--------+---------+-----------
time | bigint | not null
value | numeric |
date | date |
Indexes:
"outdoor_temperature_pkey" PRIMARY KEY, btree ("time")
"outdoor_temperature_date_idx" btree (date)
"outdoor_temperature_value_idx" btree (value)
Which then massively simplified and sped up the SQL query:
SELECT time, value FROM (
SELECT DISTINCT ON (date)
date, time, value
FROM outdoor_temperature
ORDER BY date, value desc
) t
ORDER BY t.value desc;