Why doesn't Spark save timestamp based on the epoch in postgres? - postgresql

Given a table in postgreSQL such as:
CREATE TABLE events (
name VARCHAR(50),
time TIMESTAMP -- No timezone, supposed UTC
)
I'm inserting events with Spark:
val timestamp = new Timestamp(1000000000000L)
val df = Seq(("test",timestamp)).toDF("name", "time")
// Ensure Spark generated the right timestamp
val timestampInDf = df.collect().head.getAs[Timestamp]("time")
println(timestampInDf) // 2001-09-09 03:46:40.0 i.e. display for my timezone (Europe/Paris), GMT+2:00
println(timestampInDf.getTime) // 1000000000000
df.write.mode(SaveMode.Append).jdbc(url, tableName, properties)
Then querying the timestamp in postgres:
SELECT name, time, EXTRACT(EPOCH FROM time) AS epoch FROM events
Which returns
name |time |epoch |
---------+-----------------------+----------+
test2 |2001-09-09 03:46:40.000|1000007200|
There is a 2 hours offset (corresponding to my timezone) with the timestamp I expected to save.
I'd expect the timestamp to be saved based on the epoch time. Instead it looks like Spark (or Postgres) took the display time, then supposed it was in UTC time (it was not), then saved the corresponding epoch time (hence with 7200 additional seconds).
What is the reason for this behavior?
What is a proper way to save a timestamp (without timezone information) with Spark?

This is caused by PostgreSQL driver settings. Use -Duser.timezone=UTC to set your app timezone to UTC to avoid offsets.
More details on PostgreSQL date / time can be found at https://jdbc.postgresql.org/documentation/head/java8-date-time.html

Related

Convert local time to UTC for all rows postgres

Currently my table has stored the time in local time (+11:00 Sydney Australia) and I would like to convert all the values in the time column to UTC time.
The table format looks like:
Time
Suburb
2022-10-01 00:00
Cheltenham
2022-10-31 23:59
Epping
What I am hoping to get is:
Time
Suburb
2022-09-30 14:00
Cheltenham
2022-10-31 14:00
Epping
Is there a way that I can update all the rows in the time column? I have tried using the subquery
UPDATE time_table
SET time = (Select timezone('Australia/Sydney', time) from time_table)
However, it errors with "more than one row returned by a subquery used in expression". I realise that the subquery can only return 1 result, however is there a way where I can update all the rows in the Time column?
A header
Another header
First
row
Second
row
Provided that the data type of the column is timestamp (without time zone), you could
UPDATE tab
SET time = time AT TIME ZONE 'Australia/Sydney' AT TIME ZONE 'UTC';
First, the timestamp is interpreted as Australian time and converted to the corresponding timestamp with time zone, then we get the timestamp that a UTC clock shows at that time.

Postgres cast to TIMESTAMPTZ

What is the behavior of PostgreSQL when we cast a DATE to TIMESTAMP to TIMESTAMPTZ
What time zone is used?
PostgreSQL server
Client that run the query (Current session)
If you cast a date to a timestamp, time zones don't play a role, because both data types are without a time zone. The resulting timestamp will be the beginning of the day.
If you cast date to timestamp with time zone, resulting timestamp will be the beginning of the date in the time zone defined by the parameter timezone in your current session.
SHOW timezone;
TimeZone
---------------
Europe/Vienna
(1 row)
SELECT CAST (DATE '2021-01-15' AS timestamp);
timestamp
---------------------
2021-01-15 00:00:00
(1 row)
SELECT CAST (DATE '2021-01-15' AS timestamp with time zone);
timestamptz
------------------------
2021-01-15 00:00:00+01
(1 row)
Casting date to timestamp will append time 00:00:00.0 to the date.
The time zone of the current server session will be used.
By default this is the time zone setting of Postgresql server.
You can change the time zone of the current server session like this:
set time zone 'Europe/Sofia';
I do not think that the time zone of the client has any effect. More on this issue here.
Edit
As Adrian Klaver noticed "When dealing with timestamps it's best to assume the worst". Therefore better set the session time zone explicitly.
Given the situation where one needs to change the column type from integer (as epoc seconds) to timzezonetz ->
casting directly from integer to timezonetz is not possible.
but you can do this:
ALTER TABLE table_name ALTER COLUMN column_name TYPE timestamptz
USING column_name::abstime::timestamptz

Iterate through all rows, convert timestamp stored in database to UTC version and update it back

I have postgres database table, in that I have stored timestamps which are actually "Australia/Melbourne" timezone version of timestamps, and I want to update those to "UTC" version. How can I do that in one single postgres function?
I have looked for functions as I think we can iterate through rows in table and execute update in for loops. I know the single query using which you can get and update only one record at a time:
SELECT timestamp_column AT TIME ZONE 'Australia/Melbourne' AT TIME ZONE 'UTC'
from my_schema.my_table;`
Current table:
timestamp
---------------
2018-08-27 16:15:25.348
2018-05-15 13:52:12.052
2018-05-15 14:28:58.239
...
...
Expected table:
timestamp
----------------
2018-08-27 06:15:25.348
2018-05-15 03:52:12.052
2018-05-15 04:28:58.239
...
...

Date and time in UTC - how to store them in postgres?

I am getting my data: date and time in UTC, in a csv file format in separate columns. Since I will need to convert this zone to date and time of the place where I live, currently in summer to UTC+2, and maybe some other zones I was wondering what is the best practice to insert data in postgres when we are talking about type of data. Should I place both of my data in a single column or keep them separate as types: date and time, and if not should I use timestamp or timestampz (or something else).
use timestamptz it will store your time stamp in UTC. and will display it to the client according to it's locale.
https://www.postgresql.org/docs/current/static/datatype-datetime.html
For timestamp with time zone, the internally stored value is always in
UTC (Universal Coordinated Time, traditionally known as Greenwich Mean
Time, GMT). An input value that has an explicit time zone specified is
converted to UTC using the appropriate offset for that time zone. If
no time zone is stated in the input string, then it is assumed to be
in the time zone indicated by the system's TimeZone parameter, and is
converted to UTC using the offset for the timezone zone.
When a timestamp with time zone value is output, it is always
converted from UTC to the current timezone zone, and displayed as
local time in that zone. To see the time in another time zone, either
change timezone or use the AT TIME ZONE construct (see Section 9.9.3).
updated with another good point from Lukasz, I had to mention:
Also in favor of single column is the fact that if you would store
both date and time in separate columns you would still need to combine
them and convert to timestamp if you wanted to change time zone of
date.
Not doing that would lead to date '2017-12-31' with time '23:01:01' would in other time zone in fact be not only different time, but different date with all YEAR and MONTH and DAY different
another update As per Laurenz notice, don't forget the above docs quote
An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. Which means you have to manage the input dates carefully. Eg:
t=# create table t(t timestamptz);
CREATE TABLE
t=# set timezone to 'GMT+5';
SET
t=# insert into t select '2017-01-01 00:00:00';
INSERT 0 1
t=# insert into t select '2017-01-01 00:00:00' at time zone 'UTC';
INSERT 0 1
t=# insert into t select '2017-01-01 00:00:00+02';
INSERT 0 1
t=# select * from t;
t
------------------------
2017-01-01 00:00:00-05
2017-01-01 05:00:00-05
2016-12-31 17:00:00-05
(3 rows)

what will be data type to store date time in cassandra

what will be data type to store date time in cassandra
CREATE TABLE testTable (
dateValue date,
time timestamp
)
n my insert staements would be like this ,
insert into caliper.log_per_day ( timeStampValue,dateValue ) values ('2015-12-30 16:10:31','2015-12-30');
i wanted to store date & time both in one column like this '2015-12-30 16:10:31'.
but if i used timestamp it would be store like this '2015-12-30 04:10:31+0530'
Notes : primary key n other things are skip here... ignore it.
Cqlsh will display timestamps in the following format by default:
yyyy-mm-dd HH:mm:ssZ
The Z in these formats refers to an RFC-822 4-digit time zone,
If no time zone is supplied, the current time zone for the Cassandra
server node will be used.
so if you don't want to store in this way you can store it as varchar.