I am loading a Parquet file into Redshift. The Parquet file has a timestamp column.
Question: Is there any way to specify what timezone to use as context when converting the Parquet timestamp (a specific instant in time) to a TIMESTAMP in Redshift (an abstract date/time without timezone context)
Put differently:
When I INSERT '2019-01-01 00:00'::timestamp into a TIMESTAMP column, I get exactly 2019-01-01 00:00 back, no matter what my session timezone is. I would imagine CSV is similar.
When I load from Parquet, the only way I can get 2019-01-01 00:00 into a TIMESTAMP column is if the Parquet timestamp represents 2019-01-01 00:00 _UTC_. I would like to get the same result in Redshift with a timestamp like 2019-01-01 00:00 _EDT_ in Parquet (which would be 04:00 or 05:00 UTC)
Related
Currently my table has stored the time in local time (+11:00 Sydney Australia) and I would like to convert all the values in the time column to UTC time.
The table format looks like:
Time
Suburb
2022-10-01 00:00
Cheltenham
2022-10-31 23:59
Epping
What I am hoping to get is:
Time
Suburb
2022-09-30 14:00
Cheltenham
2022-10-31 14:00
Epping
Is there a way that I can update all the rows in the time column? I have tried using the subquery
UPDATE time_table
SET time = (Select timezone('Australia/Sydney', time) from time_table)
However, it errors with "more than one row returned by a subquery used in expression". I realise that the subquery can only return 1 result, however is there a way where I can update all the rows in the Time column?
A header
Another header
First
row
Second
row
Provided that the data type of the column is timestamp (without time zone), you could
UPDATE tab
SET time = time AT TIME ZONE 'Australia/Sydney' AT TIME ZONE 'UTC';
First, the timestamp is interpreted as Australian time and converted to the corresponding timestamp with time zone, then we get the timestamp that a UTC clock shows at that time.
Given a table in postgreSQL such as:
CREATE TABLE events (
name VARCHAR(50),
time TIMESTAMP -- No timezone, supposed UTC
)
I'm inserting events with Spark:
val timestamp = new Timestamp(1000000000000L)
val df = Seq(("test",timestamp)).toDF("name", "time")
// Ensure Spark generated the right timestamp
val timestampInDf = df.collect().head.getAs[Timestamp]("time")
println(timestampInDf) // 2001-09-09 03:46:40.0 i.e. display for my timezone (Europe/Paris), GMT+2:00
println(timestampInDf.getTime) // 1000000000000
df.write.mode(SaveMode.Append).jdbc(url, tableName, properties)
Then querying the timestamp in postgres:
SELECT name, time, EXTRACT(EPOCH FROM time) AS epoch FROM events
Which returns
name |time |epoch |
---------+-----------------------+----------+
test2 |2001-09-09 03:46:40.000|1000007200|
There is a 2 hours offset (corresponding to my timezone) with the timestamp I expected to save.
I'd expect the timestamp to be saved based on the epoch time. Instead it looks like Spark (or Postgres) took the display time, then supposed it was in UTC time (it was not), then saved the corresponding epoch time (hence with 7200 additional seconds).
What is the reason for this behavior?
What is a proper way to save a timestamp (without timezone information) with Spark?
This is caused by PostgreSQL driver settings. Use -Duser.timezone=UTC to set your app timezone to UTC to avoid offsets.
More details on PostgreSQL date / time can be found at https://jdbc.postgresql.org/documentation/head/java8-date-time.html
I have excel column with 2022-12-26 23:59:59 like timestamp and I want to add this data according to my local timezone to Postgres as a timestamp without timezone as a long(BIGINT) 1672117199. With which query I can achieve this?
I am getting my data: date and time in UTC, in a csv file format in separate columns. Since I will need to convert this zone to date and time of the place where I live, currently in summer to UTC+2, and maybe some other zones I was wondering what is the best practice to insert data in postgres when we are talking about type of data. Should I place both of my data in a single column or keep them separate as types: date and time, and if not should I use timestamp or timestampz (or something else).
use timestamptz it will store your time stamp in UTC. and will display it to the client according to it's locale.
https://www.postgresql.org/docs/current/static/datatype-datetime.html
For timestamp with time zone, the internally stored value is always in
UTC (Universal Coordinated Time, traditionally known as Greenwich Mean
Time, GMT). An input value that has an explicit time zone specified is
converted to UTC using the appropriate offset for that time zone. If
no time zone is stated in the input string, then it is assumed to be
in the time zone indicated by the system's TimeZone parameter, and is
converted to UTC using the offset for the timezone zone.
When a timestamp with time zone value is output, it is always
converted from UTC to the current timezone zone, and displayed as
local time in that zone. To see the time in another time zone, either
change timezone or use the AT TIME ZONE construct (see Section 9.9.3).
updated with another good point from Lukasz, I had to mention:
Also in favor of single column is the fact that if you would store
both date and time in separate columns you would still need to combine
them and convert to timestamp if you wanted to change time zone of
date.
Not doing that would lead to date '2017-12-31' with time '23:01:01' would in other time zone in fact be not only different time, but different date with all YEAR and MONTH and DAY different
another update As per Laurenz notice, don't forget the above docs quote
An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. Which means you have to manage the input dates carefully. Eg:
t=# create table t(t timestamptz);
CREATE TABLE
t=# set timezone to 'GMT+5';
SET
t=# insert into t select '2017-01-01 00:00:00';
INSERT 0 1
t=# insert into t select '2017-01-01 00:00:00' at time zone 'UTC';
INSERT 0 1
t=# insert into t select '2017-01-01 00:00:00+02';
INSERT 0 1
t=# select * from t;
t
------------------------
2017-01-01 00:00:00-05
2017-01-01 05:00:00-05
2016-12-31 17:00:00-05
(3 rows)
I have a problem that is
I migrate database table from rails.
In the tables columns timestamps with timezone is also include.
When I insert data into database, timestamp columns save current time as (eg. 2012-08-09 12:00:00 UTC+6:30)
I think it means 2012-08-09 18:30.
but when i retrieve this data from rails 3.2.6 it display as 2012-08-09 5:30
Why it display as 1 hour different from actual time. Is there any idea? please
At a guess, you're storing it with one timezone setting and retrieving it with a different one.
For example, if Rails thinks you're in UTC+6:30 but PostgreSQL thinks you're in UTC+5:30, and if Rails sends dates to Pg timezone qualified but reads them from Pg with the assumption that they're in local time, this would happen. It's safest to make sure your database driver always reads and writes dates timezone-qualified.
Given the one hour gap, I'm wondering if daylight savings is involved, but it could also just be that your timezone is off by one hour.
regress=# create table test ( x timestamp with time zone );
CREATE TABLE
regress=# insert into test (x) values ('2012-08-09 12:00:00 UTC+6:30');
INSERT 0 1
regress=# SET TIMEZONE = '-5:30';
SET
regress=# select * from test;
x
---------------------------
2012-08-10 00:00:00+05:30
(1 row)
regress=# SET TIMEZONE = '-6:30';
SET
regress=# select * from test;
x
---------------------------
2012-08-10 01:00:00+06:30
(1 row)
Alternately, maybe your database is in time zone '-1:00' and your application is stripping off the time zone when it reads the date, so the date appears to be off by one hour. It's hard to say with the available information.
To really help you it would be necessary for you to show:
The code that inserts the date
The INSERT statement that really inserts the date, captured by enabling log_statement = 'all' in postgresql.conf, along with any SET TIMEZONE statements that session ran before the INSERT.
The result of SELECTing that column from the database after a SET TIMEZONE = 'UTC'
and the code that reads the date
the time
2012-08-09 12:00:00 UTC+6:30
is in the timezone UTC+6:30(GMT+6:30) when you save it.
When you retrieve it you get the time in GMT timezone.
Both are same timings as 2012-08-09 5:30 GMT = 12:00:00 GMT+6:30.
EDIT:
This is for #CraigRinger...
Let's say...
UTC is London and UTC + 6:30 is Rangoon so...
6:30 at Rangoon = 12:00 at London
Subtracting 6:30 from both
12:00 at Rangoon = 5:30 at London
12:00 UTC+6:30 = 5:30 UTC
Data stored in UTC+6:30 = Data retrieved in UTC