Need to fix timestamps in my TimescaleDB database (the number of seconds provided to TO_TIMESTAMP was incorrect by exactly a factor of 1000) - postgresql

I have a TimescaleDB database in which some of the timestamps across several tables are incorrect- I inadvertently gave the TO_TIMESTAMP() function the number of milliseconds in Unix time, instead of seconds. Thus, all of these data points are 1000 times longer since 1970 than they should be. I can easily isolate which of these rows need to be fixed with a check for future dates in the where clause, but I am a little stuck on how to convert and replace these incorrect timestamps. I essentially need to get the unix time representation, divide it by 1000, and replace that value in the row, but my SQL is too rusty to piece this query together.
I see that i can use extract(epoch from ) to get the number of seconds, but how to do this to every row and then updating its timestamp is not clear to me.
Edit:
When using the query:
UPDATE table_name
SET time = TO_TIMESTAMP(extract(epoch from time) / 1000.0)
WHERE
time > '2020-01-01 00:00:00';
I get the error:
new row for relation "_hyper_8_295_chunk" violates check constraint
"constraint_295"

I think it would probably be best to create a new hypertable and run an insert into select from the old hypertable to the new. Or potentially do it in batches. This is because Timescale restricts updating of the partitioning keys so that items don't move between partitions. You can do a delete and then an insert to make that work similarly, but it's going to be more efficient to just create a new hypertable, move everything over with the correct timestamps and then rename than to try doing updates etc.

Related

How can I speed up a Postgres query, in which I want to query all entries within a date range

So I'm doing this query
select * from table where time>'2019-01-28 04:13:36.790000' and time<'2019-01-28 04:13:46.790000';
It used to be very fast, but as the table grew it's now taking several minutes to complete. I'm not exactly sure how many entries are in the table. I'm guessing tens of millions. I just want to be able to query entries in a given time interval. Is there anything I can do to the table to make this quicker.
It's hard to say for sure without more context, but if you don't already have an index on time, consider adding one.
CREATE INDEX idx_table_time ON table (time ASC)

Calculate the sum of time column in PostgreSql

Can anyone suggest me, the easiest way to find summation of time field in POSTGRESQL. i just find a solution for MYSQL but i need the POSTGRESQL version.
MYSQL: https://stackoverflow.com/questions/3054943/calculate-sum-time-with-mysql
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(timespent))) FROM myTable;
Demo Data
id time
1 1:23:23
2 4:00:23
3 9:23:23
Desired Output
14:47:09
What you want, is not possible. But you probably misunderstood the time type: it represents a precise time-point in a day. It doesn't make much sense, to add two (or more) times. f.ex. '14:00' + '14:00' = '28:00' (but there are no 28th hour in a day).
What you probably want, is interval (which represents time intervals; hours, minutes, or even years). sum() supports interval arguments.
If you use intervals, it's just that simple:
SELECT sum(interval_col) FROM my_table;
Although, if you stick to the time type (but you have no reason to do that), you can cast it to interval to calculate with it:
SELECT sum(time_col::interval) FROM my_table;
But again, the result will be interval, because time values cannot exceed the 24th hour in a day.
Note: PostgreSQL will even do the cast for you, so sum(time_col) should work too, but the result is interval in this case too.
I tried this solution on sql fieddle:
link
Table creation:
CREATE TABLE time_table (
id integer, time time
);
Insert data:
INSERT INTO time_table (id,time) VALUES
(1,'1:23:23'),
(2,'4:00:23'),
(3,'9:23:23')
query the data:
SELECT
sum(s.time)
FROM
time_table s;
If you need to calculate sum of some field, according another field, you can do this:
select
keyfield,
sum(time_col::interval) totaltime
FROM myTable
GROUP by keyfield
Output example:
keyfield; totaltime
"Gabriel"; "10:00:00"
"John"; "36:00:00"
"Joseph"; "180:00:00"
Data type of totaltime is interval.

Add datetime constraint to a PostgreSQL multi-column partial index

I've got a PostgreSQL table called queries_query, which has many columns.
Two of these columns, created and user_sid, are frequently used together in SQL queries by my application to determine how many queries a given user has done over the past 30 days. It is very, very rare that I query these stats for any time older than the most recent 30 days.
Here is my question:
I've currently created my multi-column index on these two columns by running:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
But I'd like to further restrict the index to only care about those queries in which the created date is within the past 30 days. I've tried doing the following:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
WHERE created >= NOW() - '30 days'::INTERVAL`
But this throws an exception stating that my function must be immutable.
I'd love to get this working so that I can optimize my index, and cut back on the resources Postgres needs to do these repeated queries.
You get an exception using now() because the function is not IMMUTABLE (obviously) and, quoting the manual:
All functions and operators used in an index definition must be "immutable" ...
I see two ways to utilize a (much more efficient) partial index:
1. Partial index with condition using constant date:
CREATE INDEX queries_recent_idx ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp;
Assuming created is actually defined as timestamp. It wouldn't work to provide a timestamp constant for a timestamptz column (timestamp with time zone). The cast from timestamp to timestamptz (or vice versa) depends on the current time zone setting and is not immutable. Use a constant of matching data type. Understand the basics of timestamps with / without time zone:
Ignoring time zones altogether in Rails and PostgreSQL
Drop and recreate that index at hours with low traffic, maybe with a cron job on a daily or weekly basis (or whatever is good enough for you). Creating an index is pretty fast, especially a partial index that is comparatively small. This solution also doesn't need to add anything to the table.
Assuming no concurrent access to the table, automatic index recreation could be done with a function like this:
CREATE OR REPLACE FUNCTION f_index_recreate()
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
DROP INDEX IF EXISTS queries_recent_idx;
EXECUTE format('
CREATE INDEX queries_recent_idx
ON queries_query (user_sid, created)
WHERE created > %L::timestamp'
, LOCALTIMESTAMP - interval '30 days'); -- timestamp constant
-- , now() - interval '30 days'); -- alternative for timestamptz
END
$func$;
Call:
SELECT f_index_recreate();
now() (like you had) is the equivalent of CURRENT_TIMESTAMP and returns timestamptz. Cast to timestamp with now()::timestamp or use LOCALTIMESTAMP instead.
Select today's (since midnight) timestamps only
db<>fiddle here
Old sqlfiddle
If you have to deal with concurrent access to the table, use DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY. But you can't wrap these commands into a function because, per documentation:
... a regular CREATE INDEX command can be performed within a
transaction block, but CREATE INDEX CONCURRENTLY cannot.
So, with two separate transactions:
CREATE INDEX CONCURRENTLY queries_recent_idx2 ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp; -- your new condition
Then:
DROP INDEX CONCURRENTLY IF EXISTS queries_recent_idx;
Optionally, rename to old name:
ALTER INDEX queries_recent_idx2 RENAME TO queries_recent_idx;
2. Partial index with condition on "archived" tag
Add an archived tag to your table:
ALTER queries_query ADD COLUMN archived boolean NOT NULL DEFAULT FALSE;
UPDATE the column at intervals of your choosing to "retire" older rows and create an index like:
CREATE INDEX some_index_name ON queries_query (user_sid, created)
WHERE NOT archived;
Add a matching condition to your queries (even if it seems redundant) to allow it to use the index. Check with EXPLAIN ANALYZE whether the query planner catches on - it should be able to use the index for queries on an newer date. But it won't understand more complex conditions not matching exactly.
You don't have to drop and recreate the index, but the UPDATE on the table may be more expensive than index recreation and the table gets slightly bigger.
I would go with the first option (index recreation). In fact, I am using this solution in several databases. The second incurs more costly updates.
Both solutions retain their usefulness over time, performance slowly deteriorates as more outdated rows are included in the index.

How to query just the last record of every day

I have a table power with a datetimetz field called sample_time and a column called amp_hours.
The amp_hours field gains a record about every two minute and is reset every night at midnight.
I would like to see sample_time and amp_hours for the last record of every day.
I'm very new to SQL so I may be overlooking an obvious answer.
I saw this post on how to select the last record of a group but I'm not familiar enough with SQL to get it to work for datetimes:
I thought to use lead() or lag() to compare the date of a record with the next record but I'm using postgresql 8.3 and I think windowing was introduced in 8.4.
Try this:
SELECT DISTINCT ON (sample_time::date) sample_time, amp_hours
FROM power
ORDER BY sample_time::date DESC, sample_time DESC;

PostgreSQL does not order timestamp column correctly

I have a table in a PostgreSQL database with a column of TIMESTAMP WITHOUT TIME ZONE type. I need to order the records by this column and apparently PostgreSQL has some trouble doing it as both
...ORDER BY time_column
and
...ORDER BY time_column DESC
give me the same order of elements for my 3-element sample of records having the same time_column value, except the amount of milliseconds in it.
It seems that while sorting, it does not consider milliseconds in the value.
I am sure the milliseconds are in fact stored in the database because when I fetch the records, I can see them in my DateTime field.
When I first load all the records and then order them by the time_column in memory, the result is correct.
Am I missing some option to make the ordering behave correctly?
EDIT: I was apparently missing a lot. The problem was not in PostgreSQL, but in NHibernate stripping the milliseconds off the DateTime property.
It's a foolish notion that PostgreSQL wouldn't be able to sort timestamps correctly.
Run a quick test and rest asured:
CREATE TEMP TABLE t (x timestamp without time zone);
INSERT INTO t VALUES
('2012-03-01 23:34:19.879707')
,('2012-03-01 23:34:19.01386')
,('2012-03-01 23:34:19.738593');
SELECT x FROM t ORDER by x DESC;
SELECT x FROM t ORDER by x;
q.e.d.
Then try to find out, what's really happening in your query. If you can't, post a testcase and you will be helped presto pronto.
try cast your column to ::timestamp like that:
SELECT * FROM TABLE
ORDER BY time_column::timestamp