SQL Timestamp offset (Postgres) - postgresql

I want to copy records from one database to another using Pentaho. But I ran into a problem. Let's say there is a transaction_timestamp column ir Table1 with data type timestamp with time zone. But once I select records from the source DB and insert them into the other database - values of the column are offset by one hour or so. The weirdest thing - this doesn't even affect all records. I also tried something like this :
select
transaction_timestamp::timestamp without time zone as transaction_timestamp,
t1.* from
table1 t1
And it didn't work. Could the problem be that when I copy the records to the 2nd DB, it sets all values to the local timezone? But why then the select statement I mentioned doesn't work? And only a part of the records is affected?

Related

Why is counting by i not working but selecting from hdb works just fine?

I have a partitioned hdb and the following query works fine :
select from tableName where date within(.z.d-8;.z.d)
but the following query breaks :
select count i by date from tableName where date within(.z.d-8;.z.d)
with the following error :
"./2017.10.14/:./2017.10.15/tableName. OS reports: No such file or directory"
Any idea why this might happen ?
As the error indicates, there's no table called tableName is in a partition for 2017.10.15. For partitioned databases kdb caches table counts; it happens when it runs the first query with the following properties:
the "select" part of the query is either count i or the partition field itself (in your example that would be date)
the where clause is either empty or constrains the partition field only.
(.Q.ps -- partitioned select -- is where all this magic happens, see the definition of it if you need all the details.)
You have several options to avoid the error you're getting.
Amend the query to avoid having either count i on its own or the empty where.
Any of the following will work; the first is the simplest while the others are useful if you're writing a query for the general case and don't know field names in advance.
select count sym by date where date within (.z.d-8;.z.d) / any field except i
select count i by date where date within (.z.d-8;.z.d),i>=0
select `dummy, count i by date where date within (.z.d-8;.z.d)
select {count x}i by date where date within (.z.d-8;.z.d)
Use .Q.view to define a sub-view to exclude partitions with missing tables; kdb won't cache or otherwise access them.
The previous solutions will not work if the date range in your select includes partitions with missing tables. In this case you can either
Run .Q.chk to create empty tables where they are mising; or
Run .Q.bv to construct the dictionary of table schemas for tables with missing partitions.
You probably need to create the missing tables. I believe when doing a 'count i' on a partitioned table as you have done, it counts every single partition (not just the ones in your query) and caches these counts in .Q.pn
If you run .Q.chk[HDB root location], it should create the missing tables and your query should work
https://code.kx.com/q/ref/dotq/#qchk-fill-hdb
'count i' will scan each partition regardless of what is specified in the where clause. So it's likely those two partitions are incomplete.
Better to pick an actual column for things like that or else something like
select count i>0 by date from tableName where date within(.z.d-8;.z.d)
will prevent the scanning of all partitions.
Jason

Need to fix timestamps in my TimescaleDB database (the number of seconds provided to TO_TIMESTAMP was incorrect by exactly a factor of 1000)

I have a TimescaleDB database in which some of the timestamps across several tables are incorrect- I inadvertently gave the TO_TIMESTAMP() function the number of milliseconds in Unix time, instead of seconds. Thus, all of these data points are 1000 times longer since 1970 than they should be. I can easily isolate which of these rows need to be fixed with a check for future dates in the where clause, but I am a little stuck on how to convert and replace these incorrect timestamps. I essentially need to get the unix time representation, divide it by 1000, and replace that value in the row, but my SQL is too rusty to piece this query together.
I see that i can use extract(epoch from ) to get the number of seconds, but how to do this to every row and then updating its timestamp is not clear to me.
Edit:
When using the query:
UPDATE table_name
SET time = TO_TIMESTAMP(extract(epoch from time) / 1000.0)
WHERE
time > '2020-01-01 00:00:00';
I get the error:
new row for relation "_hyper_8_295_chunk" violates check constraint
"constraint_295"
I think it would probably be best to create a new hypertable and run an insert into select from the old hypertable to the new. Or potentially do it in batches. This is because Timescale restricts updating of the partitioning keys so that items don't move between partitions. You can do a delete and then an insert to make that work similarly, but it's going to be more efficient to just create a new hypertable, move everything over with the correct timestamps and then rename than to try doing updates etc.

Can redshift stored procs be used to make a date range UNION ALL query

Since redshift does not natively support date partitioning, other than in redshift spectrum, all our tables are date partitioned
my_table_name_YYYY_MM_DD
So every time we do queries it's usually looks like this
select columns, i, want from
(select * from tbl1_date UNION ALL
select * from tbl2_date UNION ALL
select * from tbl3_date UNION ALL
select * from tbl4_date);
Where there's one UNION ALL per day.
Can stored procedures generate a date rangeso our business analysts stop losing their hair when I send them a python or bash script to generate the date range?
Yes, you could create a stored procedure that generates dynamic SQL using only the needed tables. See my answer here for a template to start from: Issue with passing column name as a parameter to "PREPARE" in Redshift
However, you should be aware that Redshift is able to achieve most of what you want automatically using a "Time Series Table" view. This documented here:
Using Time Series Tables
Use Time-Series Tables
You define a view that is composed of a UNION ALL over a sequence of identical tables with a sort key defined on a commonly filtered date or timestamp column. When you query that view Redshift is able to eliminate the scans on any UNION'ed tables that would not contain relevant data.
For example:
CREATE OR REPLACE VIEW store_sales_vw
AS SELECT * FROM store_sales_1998
UNION ALL SELECT * FROM store_sales_1999
UNION ALL SELECT * FROM store_sales_2001
UNION ALL SELECT * FROM store_sales_2002
UNION ALL SELECT * FROM store_sales_2003
;
SELECT cd.cd_education_status
,COUNT(*) sales_count
,AVG(ss_quantity) avg_quantity
FROM store_sales_vw vw
JOIN customer_demographics cd
ON vw.ss_cdemo_sk = cd.cd_demo_sk
WHERE ss_sold_ts BETWEEN '1999-09-01' AND '2000-08-31'
GROUP BY cd.cd_education_status
In this example Redshift will only use the store_sales_1999 and store_sales_2000 tables, skipping the other tables in the view. Note that the table skipping is not based the name of the table. Redshift knows the MIN and MAX values of the sort key timestamp in each table.
If you purse this approach please be sure to keep the total size of the UNION fairly low. I recommend (at most) daily tables for the last week [7], weekly tables for the last month [5], quarterly tables for the last year [4], and then yearly tables for older data.
You can use ALTER TABLE … APPEND to merge the daily tables in weekly tables and so on.

getting second difference with sql statement

I have a table in postgresql which stores time stamp with timezone for every row inserted.
How can I use postgresql's function to find a the difference in seconds from the timestamp in one of the rows already inserted to the current postgresql server time stamp?
Assuming that the column name is ts and the table name is t, you can query like this:
select current_timestamp - max(ts) from t;
If the table contains large amount of data, this query will be very slow. In that case, you should have index on the timestamp column.

PostgreSQL does not order timestamp column correctly

I have a table in a PostgreSQL database with a column of TIMESTAMP WITHOUT TIME ZONE type. I need to order the records by this column and apparently PostgreSQL has some trouble doing it as both
...ORDER BY time_column
and
...ORDER BY time_column DESC
give me the same order of elements for my 3-element sample of records having the same time_column value, except the amount of milliseconds in it.
It seems that while sorting, it does not consider milliseconds in the value.
I am sure the milliseconds are in fact stored in the database because when I fetch the records, I can see them in my DateTime field.
When I first load all the records and then order them by the time_column in memory, the result is correct.
Am I missing some option to make the ordering behave correctly?
EDIT: I was apparently missing a lot. The problem was not in PostgreSQL, but in NHibernate stripping the milliseconds off the DateTime property.
It's a foolish notion that PostgreSQL wouldn't be able to sort timestamps correctly.
Run a quick test and rest asured:
CREATE TEMP TABLE t (x timestamp without time zone);
INSERT INTO t VALUES
('2012-03-01 23:34:19.879707')
,('2012-03-01 23:34:19.01386')
,('2012-03-01 23:34:19.738593');
SELECT x FROM t ORDER by x DESC;
SELECT x FROM t ORDER by x;
q.e.d.
Then try to find out, what's really happening in your query. If you can't, post a testcase and you will be helped presto pronto.
try cast your column to ::timestamp like that:
SELECT * FROM TABLE
ORDER BY time_column::timestamp