from_utc_timestamp not taking daylight saving time into account - pyspark

I need to convert timestamp in UTC to MST or EST time, but its not taking daylight saving into account.
Also is it better to use MST or EST or should we use "America/Phoenix' or "America/New York".
Please help.
Thanks,
Naveed

Use America/New_York for Eastren and America/Phoenix for Mountain timezones, Spark from_utc_timestamp function will automatically gets daylight timing covered for us.
Example:
#set timezone for the session
spark.conf.set('spark.sql.session.timeZone', 'America/New_York')
#daylight saving time i.e -4:00 hrs from utc
spark.sql("""select current_timestamp as current_ts,from_utc_timestamp(current_timestamp,"America/New_York") utc_to_est""").show(10,False)
#+-----------------------+-----------------------+
#|current_ts |utc_to_est |
#+-----------------------+-----------------------+
#|2020-05-10 19:04:28.369|2020-05-10 15:04:28.369|
#+-----------------------+-----------------------+
#after daylight saving time that will be -5:00 hrs from utc.
spark.sql("""select from_utc_timestamp("2020-12-10 15:04:28.369","America/New_York") utc_to_est""").show(10,False)
#+-----------------------+
#|utc_to_est |
#+-----------------------+
#|2020-12-10 10:04:28.369|
#+-----------------------+

Related

Convert existing UTC field to pst

I'm reading this SO post and am confused. Suppose I have a field that I've been told from our data engineering team is in UTC time. How can I cast it as PST (Pacific time, California)?
select
'2021-02-12 05:27:51' as if_this_is_already_utc_then,
'2021-02-12 05:27:51' at time zone 'pst' as is_this_pst,
'2021-02-12 05:27:51' at time zone 'utc' at time zone 'pst' as or_is_this_pst
Returns:
The two attempts at PST show different timestamps. Which is correct, if any? I'm confused because on the linked post they first seem to convert to UTC and then again to EST. Which do I need here if I know that the original timestamp is UTC and I want to get it to PST California?

Postgres "time zone at" isn't respecting mountain standard time when converting

I have a column starts_at with a type of TIMESTAMP WITHOUT TIME ZONE because it's representing the time of an appointment and should not change during a DST shift.
However, our library that handles recurring appointments needs this time in UTC. I am attempting to convert starts_at to UTC, but am seeing that I'm getting times representing MDT (daylight savings time) rather than MST (standard time).
For example, take the following:
SELECT starts_at, timezone('America/Denver', starts_at) AS new_starts_at
I would expect to get the following result:
--------------------------------------------------
| starts_at | new_starts_at
--------------------------------------------------
| 2018-09-04 13:05:00 | 2018-09-04 20:05:00+00
Instead, I'm getting the following:
--------------------------------------------------
| starts_at | new_starts_at
--------------------------------------------------
| 2018-09-04 13:05:00 | 2018-09-04 19:05:00+00
new_starts_at should be returning in MST, which would be 2018-09-04 20:05:00+00. My impression was that using the Olsen timezone (America/Denver) would inform Postgres of whether or not there was a DST shift in place. If I replace America/Denver with MST, I see the correct result.
I'm sure this is just a misunderstanding of Postgres timezone types on my part. That said, thanks in advance for the education!
The expression timezone('America/Denver', starts_at) interprets starts_at as being in Denver local time, the result is a timestamp with time zone.
Now when you output that value, it is transformed to your session time zone, which is UTC.
13:05 in Denver is 19:05 in UTC, which happens to be your session time zone.
During daylight savings time, Denver is offset 6 hours from UTC.

In SQL Server 2014 , how to convert given UTC date time to PST date time, when the server is in UTC

How to convert given UTC date time to PST date time, by keeping the daylight stuff in time calculations?
Note that the server I am hitting is in utc. I mean, GETDATE() = GETUTCDATE().
Also, we can't use AT TIME ZONE, as DB is on older SQL Server.
Thanks in advance for the help.
Search the site, example: Convert Datetime column from UTC to local time in select statement
Read the link above, also include daylight stuff in there for some of the responses too.

PostgreSQL date() with timezone

I'm having an issue selecting dates properly from Postgres - they are being stored in UTC, but
not converting with the Date() function properly.
Converting the timestamp to a date gives me the wrong date if it's past 4pm PST.
2012-06-21 should be 2012-06-20 in this case.
The starts_at column datatype is timestamp without time zone. Here are my queries:
Without converting to PST timezone:
Select starts_at from schedules where id = 40;
starts_at
---------------------
2012-06-21 01:00:00
Converting gives this:
Select (starts_at at time zone 'pst') from schedules where id = 40;
timezone
------------------------
2012-06-21 02:00:00-07
But neither convert to the correct date in the timezone.
Basically what you want is:
$ select starts_at AT TIME ZONE 'UTC' AT TIME ZONE 'US/Pacific' from schedules where id = 40
I got the solution from this article is below, which is straight GOLD!!! It explains this non-trivial issue very clearly, give it a read if you wish to understand pstgrsql TZ management better.
Expressing PostgreSQL timestamps without zones in local time
Here is what is going on. First you should know that 'PST timezone is 8 hours behind UTC timezone so for instance Jan 1st 2014, 4:30 PM PST (Wed, 01 Jan 2014 16:00:30 -0800) is equivalent to Jan 2nd 2014, 00:30 AM UTC (Thu, 02 Jan 2014 00:00:30 +0000). Any time after 4:00pm in PST slips over to the next day, interpreted as UTC.
Also, as Erwin Brandstetter mentioned above, postresql has two type of timestamps data type, one with a timezone and one without.
If your timestamps include a timezone, then a simple:
$ select starts_at AT TIME ZONE 'US/Pacific' from schedules where id = 40
will work. However if your timestamp is timezoneless, executing the above command will not work, and you must FIRST convert your timezoneless timestamp to a timestamp with a timezone, namely a UTC timezone, and ONLY THEN convert it to your desired 'PST' or 'US/Pacific' (which are the same up to some daylight saving time issues. I think you should be fine with either).
Let me demonstrate with an example where I create a timezoneless timestamp. Let's assume for convenience that our local timezone is indeed 'PST' (if it weren't then it gets a tiny bit more complicated which is unnecessary for the purpose of this explanation).
Say I have:
$ select timestamp '2014-01-2 00:30:00' AS a, timestamp '2014-01-2 00:30:00' AT TIME ZONE 'UTC' AS b, timestamp '2014-01-2 00:30:00' AT TIME ZONE 'UTC' AT TIME ZONE 'PST' AS c, timestamp '2014-01-2 00:30:00' AT TIME ZONE 'PST' AS d
This will yield:
"a"=>"2014-01-02 00:30:00" (This is the timezoneless timestamp)
"b"=>"2014-01-02 00:30:00+00" (This is the UTC TZ timestamp, note that up to a timezone, it is equivalent to the timezoneless one)
"c"=>"2014-01-01 16:30:00" (This is the correct 'PST' TZ conversion of the UTC timezone, if you read the documentation postgresql will not print the actual TZ for this conversion)
"d"=>"2014-01-02 08:30:00+00"
The last timestamp is the reason for all the confusion regarding converting timezoneless timestamp from UTC to 'PST' in postgresql. When we write:
timestamp '2014-01-2 00:30:00' AT TIME ZONE 'PST' AS d
We are taking a timezoneless timestamp and try to convert it to 'PST TZ (we indirectly assume that postgresql will understand that we want it to convert the timestamp from a UTC TZ, but postresql has plans of its own!). In practice, what postgresql does is it takes the timezoneless timestamp ('2014-01-2 00:30:00) and treats it as if it WERE ALREADY a 'PST' TZ timestamp (i.e: 2014-01-2 00:30:00 -0800) and converts that to UTC timezone!!! So it actually pushes it 8 hours ahead instead of back! Thus we get (2014-01-02 08:30:00+00).
Anyway, this last (un-intuitive) behavior is the cause of all confusion. Read the article if you want a more thorough explanation, I actually got results which are a bit different then their on this last part, but the general idea is the same.
I don't see the exact type of starts_at in your question. You really should include this information, it is the key to the solution. I'll have to guess.
PostgreSQL always stores UTC time for the type timestamp with time zone internally. Input and output (display) are adjusted to the current timezone setting or to the given time zone. The effect of AT TIME ZONE also changes with the underlying data type. See:
Ignoring time zones altogether in Rails and PostgreSQL
If you extract a date from type timestamp [without time zone], you get the date for the current time zone. The day in the output will be the same as in the display of the timestamp value.
If you extract a date from type timestamp with time zone (timestamptz for short), the time zone offset is "applied" first. You still get the date for the current time zone, which agrees with the display of the timestamp. The same point in time translates to the next day in parts of Europe, when it is past 4 p.m. in California for instance. To get the date for a certain time zone, apply AT TIME ZONE first.
Therefore, what you describe at the top of the question contradicts your example.
Given that starts_at is a timestamp [without time zone] and the time on your server is set to the local time. Test with:
SELECT now();
Does it display the same time as a clock on your wall? If yes (and the db server is running with correct time), the timezone setting of your current session agrees with your local time zone. If no, you may want to visit the setting of timezone in your postgresql.conf or your client for the session. Details in the manual.
Be aware that the timezone offset used the opposite sign of what's displayed in timestamp literals. See:
Peculiar time zone handling in a Postgres database
To get your local date from starts_at just
SELECT starts_at::date
Tantamount to:
SELECT date(starts_at)
BTW, your local time is at UTC-7 right now, not UTC-8, because daylight savings time is in effect (not among the brighter ideas of the human race).
Pacific Standard TIME (PST) is normally 8 hours "earlier" (bigger timestamp value) than UTC (Universal Time Zone), but during daylight saving periods (like now) it can be 7 hours. That's why timestamptz is displayed as 2012-06-21 02:00:00-07 in your example. The construct AT TIME ZONE 'PST' takes daylight saving time into account. These two expressions yield different results (one in winter, one in summer) and may result in different dates when cast:
SELECT '2012-06-21 01:00:00'::timestamp AT TIME ZONE 'PST'
, '2012-12-21 01:00:00'::timestamp AT TIME ZONE 'PST'
I know this is an old one but You may want to consider using AT TIME ZONE "US/Pacific" when casting to avoid any PST/PDT issues. So
SELECT starts_at::TIMESTAMPTZ AT TIME ZONE "US/Pacific"
FROM schedules
WHERE ID = '40';

Postgres UTC date format & epoch cast, sign inversion

Could someone explain me this sign inversion, i'm lost here...
SELECT EXTRACT(EPOCH FROM '01-01-1970 00:00:00 UTC+01'::timestamp with time zone)
=> 3600
SELECT EXTRACT(EPOCH FROM '1970-01-01 00:00:00+01'::timestamp with time zone)
=> -3600
Postgres 8.3.14
This
1970-01-01 00:00:00+01
is an ISO 8601 timestamp with a +1 hour offset and +1 means east of Greenwich. The offsets in these
01-01-1970 00:00:00 UTC+01
1970-01-01 00:00:00 UTC+01
1970-01-01 00:00:00 XXX+01
1970-01-01 00:00:00 HAHA+01
1970-01-01 00:00:00 Pancakes+01
will be interpreted as POSIX style timezones where +1 means west of Greenwich:
PostgreSQL will accept POSIX-style time zone specifications of the form STDoffset or STDoffsetDST, where STD is a zone abbreviation, offset is a numeric offset in hours west from UTC
and those even come with a warning:
One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0 will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQL follows the ISO-8601 convention that positive timezone offsets are east of Greenwich.
Note the west versus east difference.