PostgreSQL: require explicit time zone on INSERT/UPDATE of timestamptz columns - postgresql

I'm digging into how Postgres works and have decided that any date/time data in my database should be of datatype timestamptz.
The rules that govern how Postgres parses date/time information vary based on the server's timezone, the client session timezone, and/or the database timezone setting. I can't expect my developers to know all of this, so to avoid any ambiguity I would like to somehow procedurally require a timezone be specified in any INSERT or UPDATE to a timestamptz column, and for any UPDATES or INSERTS to fail when the input value for a timestamptz column doesn't explicity include a time zone. I've created a regex that I can use to match against the input value; I just don't know how to hook up the plumbing.
I first thought I could do this with a custom domain; however, it appears that the CHECK constraint on a domain is done after the input string has already been parsed, so that won't work. (By then, the server has already inferred the time zone for values where time zone wasn't explicitly included.)
I could use a custom data type, but that's a whole can of worms there and I'm not sure that doing so would preserve all of the operators and functions that would operate on the underlying timstamptz column.
I could use BEFORE INSERT and BEFORE UPDATE triggers, but doing so would require me to iterate over every column in the NEW record, determine its datatype, then check the value against a regex to ensure a time zone is specified.
Does the community have any ideas on how to accomplish this? I think the BEFORE INSERT/BEFORE UPDATE is likely the best place to do this work, but I don't know how to iterate over the new record and find the data type for each column.
Is there an easier way to accomplish this that I've missed?

I can't expect my developers to know all of this
I think that's your problem. If you want to use PostgreSQL and work with time zones, you need your developers to understand it.
It's all very simple: Only set the timezone parameter correctly for the client session, then everything will just work.

Related

Now() function is getting populated with different date time other than excepted in postgresql 9.6.9

We are having some columns in database with data type 'timestamp with time zone'. We are populating with now() function as default value. Sometimes it is getting populated with dates from 2017/18. What might be the issue?
If you are using now() as default value, the likely cause of your problem is that these dates are inserted explicitly. Remember that the default value is only used if no value is supplied for the column.
There are other, much less likely explanations like database transactions that have been open for a year, but I don't believe that.
If you always want to override the value, use a BEFORE trigger.

Index for TIMESTAMPTZ and function immutability

We have a structure similar to the following:
create table company
(
id bigint not null,
tz text not null
);
create table company_data
(
company_id bigint not null,
ts_tz timestamp with time zone not null
);
The tables are simplified.
Fiddle with sample data here: SQL Fiddle
Every company has a fixed TZ. So, when we need to extract some information from company_data we use a query similar to the following:
select
cd.company_id,
cd.ts_tz at time zone c.tz
from company_data cd
join company c on c.id = cd.company_id;
We also have a function to get company tz:
create or replace function tz_company(f_company_id bigint) returns text
language plpgsql
as
$$
declare
f_tz text;
begin
select c.tz from company c where c.id = f_company_id into f_tz;
return f_tz;
end;
$$;
And another to transform a ts in a date applying a tz:
create or replace function tz_date(timestamp with time zone, text) returns date
language plpgsql
immutable strict
as
$$
begin
return ($1 at time zone $2) :: date;
end;
$$;
The problem we are having now is that company_data (and other similar tables) is a large and frequently used table. The majority of the SELECTs in that table performs filtering using a DATE.
For example:
select cd.company_id,
cd.ts_tz at time zone tz_company(cd.company_id)
from company_data cd
where tz_date(cd.ts_tz, tz_company(cd.company_id)) >= '2019-08-20'
and tz_date(cd.ts_tz, tz_company(cd.company_id)) <= '2019-08-22';
So, to speed up queries, we need to add an index in the company_data.ts_tz column. The only way for doing this that we found was the following:
create index idx_company_data_ts_tz on company_data
(((company_data.ts_tz at time zone tz_company(company_data.company_id))::date));
For this to work, we need to make the tz_company function immutable.
Some other problems (and ideas) emerged:
1 - The version of the query using tz_date function does not use index.
Not uses index:
explain analyse
select cd.company_id,
cd.ts_tz at time zone tz_company(cd.company_id)
from company_data cd
where tz_date(cd.ts_tz, tz_company(cd.company_id)) >= '2019-08-20'
and tz_date(cd.ts_tz, tz_company(cd.company_id)) <= '2019-08-22';
Uses index:
explain analyse
select cd.company_id,
cd.ts_tz at time zone tz_company(cd.company_id)
from company_data cd
where (cd.ts_tz at time zone tz_company(cd.company_id))::date >= '2019-08-20'
and (cd.ts_tz at time zone tz_company(cd.company_id))::date <= '2019-08-22';
Why that happens?
2 - We know that, in theory, tz_company should not be immutable, at most stable. But, the company tz is an information that should not change, ever. Yes, it could happen, but it is improbable. In the past three years, we never change the tz of any company. So, is still a problem for tz_company to be immutable? If it is, how could we rewrite the index? Note that a single SELECT could bring information of more than one company and mix different timezones.
3 - Because of the complexity of dealing with indexes in a timestamptz column we consider to add another column in every table that has a ts_tz. This new column would be a date with tz already applied. Is this a good approach?
Besides, we need to apply tz before casting because every client (company) selects only dates to filter and this dates are locale aware (tz aware).
EDIT 1:
The queries used are only for demonstration. But a requirement is that the client sees the timestamps in the timezone where the event has occurred, this is an important requirement. We deal with logistics operations in Brazil and Brazil itself has four different timezones across the country.
A holding could own different companies and every company could be in a different timezone.
So, a lot of queries deals with different companies at different timezones and applying some date filtering. Today, our backend returns all data ready to display, with timezone applied and this would be difficult to change.
What we want to achieve, is an easy and performative way of dealing with those timestamptz columns: applying filter by date (tz aware) and using indexes to speedup queries.
1 - That's because tz_date is not marked as immutable. It is safe to mark it as immutable if postgres allows to create an index on the same expression as in the body of the function -- it only would allow to do it on an immutable expression. Some postgres date-time manipulation functions and type casts are immutable, some aren't. BTW I'm not sure what happens to an index if at time zone operator breaks its immutability contract when tzdata is changed -- that happens quite often on postgres or OS upgrade, depending on the settings.
2 - That's a very dangerous approach, the index becomes corrupted if you change the data. You may lose data. If you absolutely need this pseudo-immutable function I would strongly recommend to add a trigger that disallows deletes, truncates and updates of company.tz. If you ever need to change the time zone data, drop the index first.
3 - The key question is whether you happen to query data across multiple companies?
a) If you do, it's only of numerological sense. 2011-09-13 events from Niue (UTC-11) and 2019-09-13 events from New Zealand (UTC+13) can never happen at the same time. The only common property of these events is they happened on Friday the 13th. That's only notation, it never was 2019-09-13 in both countries at the same time. So please make sure your queries really make sense. In this unlikely case denormalization of the date notation as a separate timestamp without time zone column would make sense, as you're filtering by notation of time, not by the moment of time. I would recommend a trigger to populate it.
b) All your queries are single-company. In this case I would create a plain index on columns only with no expressions and create a function and make queries like this:
create index on company_data(company_id, ts_tz);
create function midnight_at_company(p_date date, p_company_id bigint) strict returns timestamp with time zone as $$
select p_date::timestamp at time zone tz from company where id = p_company_id;
$$ language sql;
-- put your company id instead of $1
explain analyse
select cd.company_id,
cd.ts_tz at time zone tz_company(cd.company_id)
from company_data cd
where company_id = $1
and cd.ts_tz >= midnight_at_company('2019-08-20', $1)
and cd.ts_tz < midnight_at_company('2019-08-23', $1); --note exact `<`, not `<=`
I would standardize all the time zones into one calling it database or server time. I understand that companies are in different places, but that is not a good reason to have timezones all over your data. Using this method will eliminate the need to have a time zone reference table. When you pull the data from any one of these companies write your code to take into account the server time zone so that it reads in your local time.
This will eliminate tons of potential confusion. This is a method used across the world, that is why data timestamps in most APIs only have one timezone.
In response to Edit:
Hi #Luiz
Let me start with there is no right or wrong answer its whatever think works best. In my case I am of the opinion that the front end view and the data should be managed some what separate. On the data side as per this topic I would handle all date stamps using server time. The need to view data one way or another is a front end issue.
In the case of your requirement I would either hard code a js switch like such.
switch("CampanyA") {
case "CompanyA":
return Timezone EST...
// code block
break;
case "CompanyB" :
// code block
break;
default:
// code block
}
or if there are to many companies for a hard code to be handling I would make a table with the "Company ID", "Company name", and "Time Zone Code". Do not link this table to your data. You should add the "Company ID" to the main table with events that have the server time zone.
Use the table with the company time zone codes to populate your look up filter that will be used to run your query. When your script event handler reacts to the drop down menu it will save the current TM Zone code associated with that company and use the value when trying to display the time zone in accordance to your requirement. I would also force your code to load data as async (1000 records or so every few mil seconds) instead of all at once. This will vastly increase performance and the user will not be able to tell that their data is still loading.
This efforts will let you manipulate the time zone to meet the current and future requirements that might come up.
I think the current schema that u are using for your application is not the best for such a problem.
You would have a lot of problems saving different timezones at the same table.
Use UTC, only use UTC on the DB/Schema level, you can set that in Postgres conf also.
Depending on the application, you could send back UTC dates and convert them to their current local time in javascript/server Side. If that's not possible have one place where the user specifies their current UTC offset and then right before you display the date/time convert it to their time.
This is going to make your life super simple and u can achieve great performance on the Query level as u now would have a performant DB Schema, the SQL functions you have makes no sense as you can achieve much better performance just by using indexing in DB.
So as per your specific requirements, I would have the schema as u have with some additions, I would index the id for the table company and would store all the data in UTC for the timestamp in table company_data.
if the company data is being requested we fetch the Timezone(Text) from the company table, using this data we can have the backend code/JS do the timezone change magic.
we have a limited amount of timezones, you can ideally have those set in config to make the lookup easier and faster.

Truncate datetimes by second for all queries, but keep milliseconds stored in Postgres

I'm trying to find a way to tell Postgres to truncate all datetime columns so that they are displayed and filtered by seconds (ignoring milliseconds).
I'm aware of the
date_trunc('second', my_date_field)
method, but do not want to do that for all datetime fields in every select and where clause that mentions them. Dates in the where clause need to also capture records with the granularity of seconds.
Ideally, I'd avoid stripping milliseconds from the data when it is stored. But then again, maybe this is the best way. I'd really like to avoid that data migration.
I can imagine Postgres having some kind of runtime configuration like this:
SET DATE_TRUNC 'seconds';
similar to how timezones are configured, but of course that doesn't work and I'm unable to find anything else in the docs. Do I need to write my own Postgres extension? Did someone already write this?

Perl DBIx::Class: getting the current time from the Database

Here is my problem:
I want to calculate how long ago a record was updated in a DB.
The DB is in PostgreSQL, the update_time field is populated by a trigger that uses CURRENT_TIMESTAMP(2). The field is inflated to a DateTime object by DBIx::Class. I get the current time in my code using DateTime->now()
My problem is that when I retrieve the field value, it's off by 1 h (ie it's 1h ahead of DateTime->now()). I am in the CET time zone, so 1h ahead of UTC currently.
The right way to solve the problem is likely at the DB level. I have tried to replace CURRENT_TIMESTAMP with LOCALTIMESTAMP, to no avail.
I think actually a more robust solution (ie one that doesn't rely on getting the DB right) would be to get the current time stamp from the DB itself. I really just need the epoch, since that's what I use to compute the difference.
So the question is: is there a simple way to do this: get the current time from the DB using DBIx::Class?
A different way to get the DB and DateTime to agree on what the current time is would also be OK!
You can use dbh_do from your DBIx::Class::Storage to run arbitrary queries. With that, just SELECT the CURRENT_TIMESTAMP.
my ( $timestamp ) = $schema->storage->dbh_do(
sub {
my ($storage, $dbh) = #_;
$dbh->selectrow_array("SELECT CURRENT_TIMESTAMP");
},
);
I always recommend to do all date/time related things on the app server and not rely on the database server(s). Essentially that means to not use a trigger but pass the datetime on insert/update and make it mandatory (NOT NULL).
Besides that you should store datetimes in UTC and convert to your local or other required timezone in your code.
Your issue likely happens because of an incorrect or missing timezone configuartion in which case DateTime defaults to its floating timezone.

Can Perl DBIx::Class override the way a column is retrieved from the database?

I have never used DBIx::Class until today, so I'm completely new at it.
I'm not sure if this is possible or not, but basically I have a table in my SQLite database that has a timestamp column in it. The default value for the timestamp column is "CURRENT_TIMESTAMP". SQLite stores this in the GMT timezone, but my server is in the CDT timeszone.
My SQLite query to get the timestamp in the correct timezone is this:
select datetime(timestamp, 'localtime') from mytable where id=1;
I am wondering if it is possible in my DBIx schema for "MyTable" to force it to apply the datetime function every time it is retrieving the "timestamp" field from the database?
In the cookbook it looks like it is possible to do this when using the ->search() function, but I am wondering if it's possible to make it so if I'm using search(), find(), all(), find_or_new(), or any function that will pull this column from the database, it will apply the datetime() SQLite function to it?
DBIx::Class seems to have great documentation - I think I'm just so new at it I'm not finding the right places/things to search for.
Thanks in advance!
I've used InflateColumn::DateTime in this way and with a timestamp, and I can confirm it works, but I wonder if you have this backward.
If your column is in UTC, mark the column UTC, and then it should be a UTC time when you load it. Then when you set_timezone on the DateTime (presumably that would be an output issue - it's at output that you care it's locally zoned) you can set it to local time and it will make the necessary adjustment.