Using a PostgreSQL database, what is the best way to store time, in hours, minutes and seconds. E.g. "40:21" as in 40 minutes and 21 seconds.
Example data:
20:21
1:20:02
12:20:02
40:21
time would be the obvious candidate to store time as you describe it. It enforces the range of daily time (00:00:00 to 24:00:00) and occupies 8 bytes.
interval allows arbitrary intervals, even negative ones, or even a mix of positive and negative ones like '1 month - 3 seconds' - doesn't fit your description well - and occupies 16 bytes. See:
How to get the number of days in a month?
To optimize storage size, make it an integer (4 bytes) signifying seconds. To convert time back and forth:
SELECT EXTRACT(epoch FROM time '18:55:28'); -- 68128 (int)
SELECT time '00:00:01' * 68128; -- '18:55:28' (time)
It sounds like you want to store a length of time, or interval. PostgreSQL has a special interval type to store a length of time, e.g.
SELECT interval'2 hours 3 minutes 20 seconds';
This can be added to a timestamp in order to form a new timestamp, or multiplied (so that (2 * interval'2 hours') = interval'4 hours'. The interval type seems to tailor-made for your use case.
Related
I have a query (with a subquery) that calculates an average of temperatures over the previous years, plus/minus one week per each day. It works, but it is not all that fast. The time series values below are just an example. Why I'm using doy is because I want a sliding window around the same date for every year.
SELECT days,
(SELECT avg(temperature)
FROM temperatures
WHERE site_id = ? AND
extract(doy FROM timestamp) BETWEEN
extract(doy FROM days) - 7 AND extract(doy FROM days) + 7
) AS temperature
FROM generate_series('2017-05-01'::date, '2017-08-31'::date, interval '1 day') days
So my question is, could this query somehow be improved? I was thinking about using some kind of window function or possibly lag and lead. However at least regular window functions only work on specific amount of rows, whereas there can be any number of measurements within the two-week window.
I can live with what I have for now, but as the amount of data grows so does the execution speed of the query. The two latter extracts could be removed, but that has no noticeable speed improvement and only makes the query less legible. Any help would be greatly appreciated.
The best index for your original query is
create index idx_temperatures_site_id_timestamp_doy
on temperatures(site_id, extract(doy from timestamp));
This can greatly improve your original query's performance.
While your original query is simple & readable, it has 1 flaw: it will calculate every day's average 14 times (on average). Instead, you could calculate these averages on a daily basis & calculate the 2 week window's weighted average (the weight for a day's average needs to be count of the individual rows in your original table). Something like this:
with p as (
select timestamp '2017-05-01' min,
timestamp '2017-08-31' max
)
select t.*
from p
cross join (select days, sum(sum(temperature)) over pn1week / sum(count(temperature)) over pn1week
from p
cross join generate_series(min - interval '1 week', max + interval '1 week', interval '1 day') days
left join temperatures on site_id = ? and extract(doy from timestamp) = extract(doy from days)
group by days
window pn1week as (order by days rows between 7 preceding and 7 following)) t
where days between min and max
order by days
However, there is not much gain here, as this is only twice as fast as your original query (with the optimal index).
http://rextester.com/JCAG41071
Notes: I used timestamp because I assumed your column's type is timestamp. But as it turned out, you use timestamptz (aka. timestamp with time zone). With that type, you cannot index the extract(doy from timestamp) expression, because that expression's output is dependent of the actual client's time zone setting.
For timestamptz use an index which (at least) starts with site_id. Using the window version should improve the performance anyway.
http://rextester.com/XTJSM42954
I know that for data type interval, one would use extract(epoch from interval) to get the length of the interval in seconds. The decimal precision of that seems to be 5, as I have a timestamp that goes all the way to the hundreds of nanoseconds in precision (truncated to microseconds when importing to table), but it returns 5 decimal places precision when I use extract(epoch) on it. So if I were to multiply the returned value of extract(epoch)by 1 million, I would only be getting precision to the tens of microseconds. Is there any functionality that can convert an interval to microseconds, or is multiplying the seconds my best option?
extract returns a double precision floating-point which precision varies according to the platform.
The double precision type typically has a range of around 1E-307 to 1E+308 with a precision of at least 15 digits.
select extract(epoch from now());
date_part
------------------
1468585846.00179
The extra_float_digits setting controls the number of extra significant digits included when a floating point value is converted to text for output
set extra_float_digits = 3;
select extract(epoch from now());
date_part
---------------------
1468586127.88283896
extra_float_digits can be set at the session as above, at database level or at server level
I need to get a timestamp in integer seconds, that won't roll over.
Can be elapsed CPU seconds, or elapsed clock or epoch seconds.
'clock' gives a date/time vector in years ... seconds.
But I can't figure out how to convert this to integer seconds.
cputime returns elapsed integer seconds but "This number can overflow the internal representation and wrap around.".
What about round(3600 * 24 * now)?
According to the manual, now returns the number of days since the year 0, as a floating point number. Multiplying by 86400 should thus give seconds.
Usually it is better to use a fixed-point format for keeping track of time, but since you are only interested in integer seconds, it should not be too much of a problem. The time resolution of now due to floating point resolution can be found like this:
>> eps(now*86400)
ans =
7.6294e-06
Or almost 8 microseconds. This should be good enough for your use case. Since these are 64-bit floating point numbers, you should not have to worry about wrapping around within your lifetime.
One practical issue is that the number of seconds since the year 0 is too large to be printed as an integer on the Matlab prompt with standard settings. If that bothers you, you can do fprintf('%i\n', round(3600 * 24 * now)), or simply subtract some arbitrary number, e.g. to get the number of seconds since the year 2000 you could do
epoch = datenum(2000, 1, 1);
round(86400 * (now - epoch))
which currently prints 488406681.
I have two sets of time series data which are collected with different time intervals. One is measured every 15 minutes and the other every 1 minute.
The measured variables are oxygen concentration, oxygen saturation and time, all three of which are measured using the two different instruments which have the different time intervals (6 column arrays in total).
I have two times between which I want to find the index's of all the entries at 15 minute intervals in the time column that sit between them.
co=1;
for i = datenum('03/11/2014/10/00/00','dd/mm/yyyy/HH/MM/SS'):datenum('03/11/2014/00/15/00','dd/mm/yyyy/HH/MM/SS')-datenum('03/11/2014/00/00/00','dd/mm/yyyy/HH/MM/SS'):('03/11/2014/16/00/00','dd/mm/yyyy/HH/MM/SS');
u=find(xyl_time==i);
New_O2(co,1)=xyl_o2conc(u);
New_O2(co,2)=xyl_o2sat(u);
v=find(sg_time==i);
New_O2(co,3)=sg_o2conc(v);
New_O2(co,4)=sq_o2sat(v);
co=co+1;
end
however, this does not work. I have narrowed it down and its something to do with the time interval that I'm using. I want it at every 15 minutes, but when I produce the 15 minute interval and then datestr that number, it comes up with '12:15AM'. I think this is causing the problem, but have no idea how to produce just times alone i.e I just want 00:15 not 12:15 not 00:15 AM or PM. just spacings of 15 minutes for my for loop.
I'm using Postgresql (on Amazon Redshift), and I need to calculate the difference between two dates and then use that value in a formula to compute a ratio, so the date difference needs to be translated to a numeric value, preferably a float or double precision.
I have two dates: 1/1/2017 and 1/1/2014. I need to find the difference between these two dates in number of days.
When I use the age function I get 1080 days:
select age('2017-01-01','2014-01-01')
However, since age returns an interval and I need to work with a numeric result, I am using EXTRACT to convert the final value. I chose epoch since I wasn't able to find any other value for EXTRACT that would yield the number of time units between the two dates. This formula yields 1095.75 days (the divisor is the number of seconds in a day):
select extract(epoch from age('2017-01-01','2014-01-01'))/86400
Why am I getting a difference of 19.75 days when using age vs using extract?
Did you try
select '2017-01-01'::date - '2014-01-01'::date;
The difference between two dates is number of days in integer
1080 is the figure you would get if every month was 30 days long (36 months by 30 days equals 1080), as it would be if you used justify_days (either explicitly or if the DBMS called it implicitly). You don't say how you're getting this 1080 figure since I believe the duration would normally just print out something like 3 years, but that seems the most likely case
1095.75 seems the more correct figure, being 365.25 days multiplied by three years.
Out of those two, I would go with the latter method.
Although, as pointed out at http://www.postgresql.org/docs/8.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT, calculating the difference between two date types should yield the number of days:
select dtend - dtstart from somewhere
Redshift release notes say they recently released a months between function which looks similar to oracles months between function if that's what you're looking for. http://docs.aws.amazon.com/redshift/latest/dg/r_MONTHS_BETWEEN_function.html