How are timestamps internally represented in Google Spanner? - unix-timestamp

I can't seem to find any information on how Spanner stores timestamps (columns) internally - the documentation only mentions:
Note that this is not the internal representation of the timestamp; it is only a human-understandable way to describe the point in time that the timestamp represents.
Are they stored internally as Unix timestamps (integers)? And if so, how does spanner store any timezone information?

Cloud Spanner does not store the timezone that was used when inserting a value into a TIMESTAMP column. It only stores the point in time. It is up to the client to choose which timezone should be used to format a TIMESTAMP value that is read from the database.
So if you execute the following:
insert into my_table (id, col_timestamp)
values (1, timestamp '2022-10-01T10:00:00+01:00');
select *
from my_table;
The timezone that is used to render the data that is returned by the select statement is chosen by the client. The actual underlying timestamp string that is returned by Cloud Spanner is always in Zulu time.
See also https://cloud.google.com/spanner/docs/reference/standard-sql/data-types#timestamp_type

Related

How to parse a ZoneDateTime format?

I'm currently trying to parse a timestamp that looks like: 2020-08-03T11:37:42.529Z[UTC]
This timestamp was generated with Java's ZoneDateTime data type
What I have already tried was to parse it via:
SELECT '2020-08-03T11:37:42.529Z[UTC]'::timestamp with time zone;
But that would fail with an exception (unless I parse up until timezone Z).
Edit
For clarification, this is currently a String that is saved in a file, so this application has no direct interaction with Java.
As explained by Adrian, the below code (using at time zone) does not really work. The only other alternative I can think of is to replace the timezone abbreviation with a proper offset. To do that, a function is probably the easiest solution:
create or replace function replace_tz_abbrev(p_input text)
returns text
as
$$
declare
l_offset_hours text;
l_tz_abbrev text;
begin
l_tz_abbrev := substring(p_input from '\[([A-Z]+)\]');
select to_char(utc_offset, 'hh24:mi')
into l_offset_hours
from pg_catalog.pg_timezone_abbrevs
where abbrev = l_tz_abbrev;
return regexp_replace(p_input, '\[[A-Z]+\]', l_offset_hours);
end;
$$
language plpgsql;
This is a rough sketch, the function needs some error checking in case the abbreviation doesn't exist. Maybe checking pg_timezone_names as a fallback to deal with names like Europe/Berlin.
The result of replace_tz_abbrev() can be cast to a timestamptz (at least with the given example). This can either be done in the function itself (and changing it to returns timestamptz or when calling it.
The below is not correct
(I'll just leave it here for reference, so that the comments
One way I can think of is to extract the time zone from the string and use it together with the to_timestamp() function:
with data (input) as (
values ('2020-08-03T11:37:42.529Z[UTC]')
)
select to_timestamp(input, 'yyyy-mm-dd"T"hh24:mi:ss.ms') at time zone substring(input from '\[([A-Z]+)\]')
from data;
Assuming the timestamp string always ends with [timezone] then:
select regexp_replace('2020-08-03T11:37:42.529Z[UTC]', '\[[^\]]*\]', '')::timestamp with time zone;
regexp_replace
-----------------------------
08/03/2020 04:37:42.529 PDT
Where regexp_replace() replaces the [timezone] with an empty string and then you cast the 2020-08-03T11:37:42.529Z portion to a timestamp with time zone.
You can't directly convert this format to a timestamp with timezone. if you are allowed to manipulate the string to get the timedate separtly from the timezone you can do this:
select TIMESTAMP 'The date and time go here'AT TIME ZONE 'timezone name or abbreviation go here';
select TIMESTAMP '2020-08-03T11:37:42.529'AT TIME ZONE 'Africa/Dar_es_Salaam';
select TIMESTAMP '2020-08-03T11:37:42.529'AT TIME ZONE 'UTC';
The reason you have to split the string is that like you discovered a simple conversion does not work and the function TO_TIMESTAMP() that allows you to specify the format does not support specifiying in the format that the string will contain a timezone, only a time offset (example: -03 hours).
-- No way to include the timezone name in the format param.
-- if you insist to add 'TZ'or 'tz' you will get "ERROR: "TZ"/"tz"/"OF" format patterns are not supported in to_date"
select TO_TIMESTAMP('2020-08-03T11:37:42.529Z[UTC]','YYYY-MM-DD HH24:MI:SS.MS')
The difference between the a timezone and a time offset is also why you might not want to do what you are trying to do, postgres does not store timezones (or time offsets). Instead it converts to UTC, and later on reads the column as UTC if the column is 'timestamp without timezone', if the column is 'timestamp with timezone' it converts from UTC to what ever the current session is set to. if you care about the timezone you should store it in another column.

Postgres truncates trailing zeros for timestamps

Postgres (V11.3, 64bit, Windows) truncates trailing zeros for timestamps. So if I insert the timestamp '2019-06-12 12:37:07.880' into the table and I read it back as text postgres returns '2019-06-12 12:37:07.88'.
Table date_test:
CREATE TABLE public.date_test (
id SERIAL,
"timestamp" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
CONSTRAINT pkey_date_test PRIMARY KEY(id)
)
SQL command when inserting data:
INSERT INTO date_test (timestamp) VALUES( '2019-06-12 12:37:07.880' )
SQL command to retrieve data:
SELECT dt.timestamp ::TEXT FROM date_test dt
returns '2019-06-12 12:37:07.88'
Do you consider this a bug or a feature?
My real issue is: I´m running queries from a C++ program and I have to convert the data returned from the database to appropriate data types. Since the protocol is text-based everything I read from the database is plain text. When parsing timestamps I first tokenize the string and then convert each token to integer. And because the millisecond part is truncated, the last token is "88" instead of "880", and converting "88" yields another value that converting "880" to integer.
That's the default display format when using a cast to text.
If you want to see all three digits, use to_char()
SELECT to_char(dt.timestamp,'yyyy-mm-d hh24:mi:ss.ms')
FROM date_test dt;
will return 2019-06-12 12:37:07.880
It’s a matter of presentation only.
First note that 07.88 seconds and 07.880 seconds is the same amount of time (also 7.88 and 07.880000000 for that matter).
PostgreSQL internally represents a timestamp in a way that we shouldn’t be concerned about as long as it’s an unambiguous representation. When you retrieve the timestamp, it is formatted into some string. This is where PostgreSQL apparently chooses not to print redundant trailing zeros. So it’s probably not even correct to say that it truncates anything. It just refrains from generating that 0.
I think that the nice solution would be to modify your parser in C++ to accept any number of decimals and parse them correctly with and without trailing zeroes. Another solution that should work is given in the answer by a_horse_with_no_name.

Is there any equivallent function in postgres like systimestamp() of oracle

I'm working on a oracle to postgresql database migration project. I need to read operating system date and time from postgres. In oracle sysdate() returns system date time in date type data and systimestamp() returns timestamp type data whatever the time_zone variable set. But in postgres current_date() and current_timestamp() always give the result relative to the timezone variable set for that database.
Synchronizing the timezone variable (i.e. set timezone='utc') is one way, but I don't want my timezone variable to be changed.
All I want to get is the current date and time of my system (time zone may include or not) like in oracle. Any pl/pgsql would be helpful. Thanks
The data type TIMESTAMP WITH TIME ZONE is different in Oracle and PostgreSQL.
While Oracle stores the timestamp information along with the timestamp, PostgreSQL stores the timestamp in UTC and displays it in the currently set time zone (available with the SQL statement SHOW TimeZone).
So the functions return the same time in PostgreSQL and Oracle, but it is displayed in a different fashion. That should normally be no problem.
If you really need to store time zone information along with a timestamp, you'll have to use a separate field to store the time zone information. You can then use AT TIME ZONE to convert the timestamp to that time zone for display.

Create timestamp index from JSON on PostgreSQL

I have a table on PostgreSQL with a field named data that is jsonb with a lot of objects, I want to make an index to speed up the queries. I'm using few rows to test the data (just 15 rows) but I don't want to have problems with the queries in the future. I'm getting data from the Twitter API, so with a week I get around 10gb of data. If I make the normal index
CREATE INDEX ON tweet((data->>'created_at'));
I get a text index, if I make:
Create index on tweet((CAST(data->>'created_at' AS timestamp)));
I get
ERROR: functions in index expression must be marked IMMUTABLE
I've tried to make it "inmutable" setting the timezone with
date_trunc('seconds', CAST(data->>'created_at' AS timestamp) at time zone 'GMT')
but I'm still getting the "immutable" error. So, How can I accomplish a timestamp index from a JSON? I know that I could make a simple column with the date because probably it will remain constant in the time, but I want to learn how to do that.
This expression won't be allowed in the index either:
(CAST(data->>'created_at' AS timestamp) at time zone 'UTC')
It's not immutable, because the first cast depends on your DateStyle setting (among other things). Doesn't help to translate the result to UTC after the function call, uncertainty has already crept in ...
The solution is a function that makes the cast immutable by fixing the time zone (like #a_horse already hinted).
I suggest to use to_timestamp() (which is also only STABLE, not IMMUTABLE) instead of the cast to rule out some source of trouble - DateStyle being one.
CREATE OR REPLACE FUNCTION f_cast_isots(text)
RETURNS timestamptz AS
$$SELECT to_timestamp($1, 'YYYY-MM-DD HH24:MI')$$ -- adapt to your needs
LANGUAGE sql IMMUTABLE;
Note that this returns timestamptz. Then:
CREATE INDEX foo ON t (f_cast_isots(data->>'created_at'));
Detailed explanation for this technique in this related answer:
Does PostgreSQL support "accent insensitive" collations?
Related:
Query on a time range ignoring the date of timestamps

Typeof Date Field Shows Text Affinity in sqlite3

Take this sqlite3 interactive session:
sqlite> create table d(t date);
sqlite> insert into d values('1913-12-23');
sqlite> select t,typeof(t) from d;
1913-12-23|text
sqlite>
Why is the date field showing up as type text? Would storage space be smaller and queries be faster if it were numeric?
Sqlite3 datatype documentation shows that the affinity for date columns is numeric according to the table in §2.2. I would guess that date fields stored as numeric values as opposed to text would take up less space, so that would be ideal. I could very well not have a good grasp of sqlite yet, so my concerns may be invalid.
For easy copy/paste:
create table d(t date);
insert into d values('1913-12-23');
select t,typeof(t) from d;
The reason that typeof('1913-12-23') returns text is that '1913-12-23' is a string, and is stored in the table as a string.
Type affinity comes into play only when the conversion would be lossless; e.g., the value '123456' would be converted into a number.
SQLite does not have a native date data type.
To allow dates to be handled by the built-in date functions, you have to store them either as yyyy-mm-dd strings or as numbers (either Julian day numbers, or seconds since the Unix epoch).
Integers take up less storage than strings, so queries are faster if the DB's bottleneck is I/O, which is likely.