PostgreSQL: smaller timestamptz type? - postgresql

Timestamptz time is 8 bytes in PostgreSQL. Is there a way to get a 6 bytes timestamptz dropping some precision?

6 bytes is pretty much out of the question, since there is no data type with that size.
With some contortions you could use a 4-byte real value:
CREATE CAST (timestamp AS bigint) WITHOUT FUNCTION;
SELECT (localtimestamp::bigint / 1000000 - 662774400)::real;
float4
--------------
2.695969e+06
(1 row)
That would give you the time since 2021-01-01 00:00:00 with a precision of about a second (but of course for dates farther from that point, the precision will deteriorate).
But the whole exercise is pretty much pointless. Trying to save 2 or 4 bytes in such a way will not be a good idea:
the space savings will be minimal; today, when you can have terabytes of storage with little effort, that seems pointless
if you don't carefully arrange your table columns, you will lose the bytes you think you have won to alignment issues
using a number instead of a proper timestamp data type will make your queries more complicated and the results hard to interpret, and it will keep you from using date arithmetic
For all these reasons, I would place this idea firmly in the realm of harmful micro-optimization.

Related

Operating with datetimes in SQLite

I'm interested in knowing the different possibilities to operate with datetimes in SQLite and understand its pros and cons. I did not find anywhere a detailed explanation of all the alternatives.
So far I have learned that
SQLite doesn't actually have a native storage class for timestamps /
dates. It stores these values as NUMERIC or TEXT typed values
depending on input format. Date manipulation is done using the builtin
date/time functions, which know how to convert inputs from the other
formats.
(quoted from here)
When any operation between datetimes is needed, I have seen two different approaches:
julianday function
SELECT julianday(OneDatetime) - julianday(AnotherDatetime) FROM MyTable;
Number of days is returned, but this can be fractional.
Therefore, you can also get some other measures of time with some extra operations. For instance, to get minutes:
SELECT CAST ((
julianday(OneDatetime) - julianday(AnotherDatetime)
) * 24 * 60 AS INTEGER)
Apparently julianday could cause some problems:
Bear in mind that julianday returns the (fractional) number of 'days'
- i.e. 24hour periods, since noon UTC on the origin date. That's usually not what you need, unless you happen to live 12 hours west of
Greenwich. E.g. if you live in London, this morning is on the same
julianday as yesterday afternoon.
More information in this post.
strftime function
SELECT strftime("%s", OneDatetime)-strftime("%s", AnotherDatetime) FROM MyTable;
Number of seconds is returned. Similarly, you can also get some other measures of time with some extra operations. For instance, to get minutes:
SELECT (strftime("%s", OneDatetime)-strftime("%s", AnotherDatetime))/60 FROM MyTable;
More information here.
My conclusion so far is: julianday seems easier to use, but can cause some problems. strftime seems more verbose, but also safer. Both of them provide only as results a single unit (either days or hours or minutes or seconds), but not a combination of many.
Question
1) Is there any other possibility to operate with datetimes?
2) What would be the best way to get directly the difference of two datetimes in time format (or date or datetime), where datetime would be formatted as 'YYYY-mm-dd HH:MM:SS', and the result would be something in the same format?
I would have imagined that something like the following would work, but it does not:
SELECT DATETIME('2016-11-04 08:05:00') - DATETIME('2016-11-04 07:00:00') FROM MyTable;
> 01:05:00
Julian day numbers are perfectly safe when computing differences.
The only problem would be if you tried to convert them into a date by truncating any fractional digits; this would result in noon, not midnight. (The same could happen if you tried to store them in integer variables.) But that is not what you're doing here.
SQLite has no built-in function to compute date/time differences; you have to convert date/time values into some number first. Whether you use (Julian) days or seconds does not really matter from a technical point of view; use whatever is easier in your program.
If you started with a different format, you might want to convert the resulting difference back into that format, e.g.:
time(difference_value, 'unixepoch') -- from seconds to hh:mm:ss
time(0.5 + difference_value) -- from Julian days to hh:mm:ss

Handling oddly-formatted timestamp in Postgres?

I have about 32 million tuples of data of the format:
2012-02-22T16:46:28.9670320+00:00
I have been told that the +00:00 indicates an hour:minute timezone offset, but also that Postgres only takes in hour offset (even in decimals), not the minute. So would I have to process the data in order to remove the last :00 from every tuple and read the data in as timestamps? I would like to avoid pre-processing the data file, but if Postgres will not accept the values otherwise, then I will do so.
In addition, the precision specified in the given data is 7 decimal places in the seconds part, whereas Postgres timestamp data type allows for maximum 6 decimal place precision (milliseconds). Would I have to modify the 7 decimal place precision to 6 in order to allow Postgres to read the records in, or will Postgres automatically convert the 7 to 6 as it reads the tuples?
pgsql=# SELECT '2016-07-10 20:12:21.8372949999+02:30'::timestamp with time zone AS ts;
ts-------------------------------
2016-07-10 17:42:21.837295+00
(1 row)
It seems that at least in PostgreSQL 9.4 and up (maybe earlier), minutes timezone offset handling is not documented, but does get processed properly if used. In a similar vein, if I try to read in a timestamp that has 7 decimal place precision in the seconds, then it will automatically convert that to 6 decimal place (microsecond) precision instead.

Storing date as time in millis

I try to represent date objects in a data storage without the hassle of Date object in Java. So I thought of using just a time in milliseconds and store the UTC time zone as well. I thought about using simple shift routines to combine everything in a single long as time zone is just 5bits (+/-12).
Can someone see any problem with this? What other compact storage schemes (other than textual representation) of date exist and how do they compare to this?
I think you're under valuing granularity in your time zone and over valuing the need for bits in the timestamp.
A long has has 8 bytes for this purpose.
Lets say you allow yourself 2 bytes for the time zone. That leaves you with 6 for the timestamp. 6*8 = 48 bits for a timestamp.
The largest number a 48 bit unsigned integer can handle is 281474976710655.
Divide by 1000 to get from miliseconds to seconds 281474976710
Punch that number into an epoch converter: 10889-08-02T05:31:50+00:00
That's the year 10,889 when we're in 2,015.
Just use 2 bytes for the timezone. You've got the space. That will easily allow you to represent the timezone as minutes offset +-24 hours. And since it's whole bytes, the packing code will be simpler to comprehend.

PostgreSQL field type for unix timestamp?

PostgreSQL field type for unix timestamp :
to store it as unix time stamp
to retrieve it as a unix timestamp as well.
Have been going through Date/Time Types postgreSQL V 9.1.
Is integer the best way to go!? (this is what I had done when I was using MySQL. Had used int(10))
The unix epoch timestamp right now (2014-04-09) is 1397071518. So we need an data type capable of storing a number at least this large.
What data types are available?
If you refer to the PostgreSQL documentation on numeric types you'll find the following options:
Name Size Minimum Maximum
smallint 2 bytes -32768 +32767
integer 4 bytes -2147483648 +2147483647
bigint 8 bytes -9223372036854775808 +9223372036854775807
What does that mean in terms of time representation?
Now, we can take those numbers and convert them into dates using an epoch converter:
Name Size Minimum Date Maximum Date
smallint 2 bytes 1969-12-31 1970-01-01
integer 4 bytes 1901-12-13 2038-01-18
bigint 8 bytes -292275055-05-16 292278994-08-17
Note that in the last instance, using seconds puts you so far into the past and the future that it probably doesn't matter. The result I've given is for if you represent the unix epoch in milliseconds.
So, what have we learned?
smallint is clearly a bad choice.
integer is a decent choice for the moment, but your software will blow up in the year 2038. The Y2K apocalypse has nothing on the Year 2038 Problem.
Using bigint is the best choice. This is future-proofed against most conceivable human needs, though the Doctor may still criticise it.
You may or may not consider whether it might not be best to store your timestamp in another format such as the ISO 8601 standard.
I'd just go with using TIMESTAMP WITH(OUT) TIME ZONE and use EXTRACT to get a UNIX timestamp representation when you need one.
Compare
SELECT NOW();
with
SELECT EXTRACT(EPOCH FROM NOW());
integer would be good, but not enough good, because postgresql doesn't support unsigned types

How to reduce date in yyyyMMddHHmmss format to 5 bytes?

I need to generate a suffix to uniquify a value. I thought of using the current data and time but need the suffix to be no more than 5 bytes long. Are there any hashing methods that can produce a hash of 5 bytes or less from a date in yyyyMMddHHmmss format?
Any other ideas? It would be simple to maintain a running counter and use the next value but this I would prefer not to have to rely on any kind of stored value.
In case you do not need to rely on printable characters, I would suggest, that you simply use the Unix timestamp. That will work great even with 4 Bytes (until January 19, 2038).
If you want to use only a subset of characters, I would suggest, that you create a list of values that you want to use.
Let's say you want to use the letters (capital and small) and the digits -> 62 values.
Now you need to convert the timestamp into base-62. Let's say your timestamp is 100:
100 = (1 * 62^1) + (38 * 62^0)
If you have stored your printable value in an array, you could use the coefficients 1 and 38 as an index into that array.
If you chose your base to small, five bytes will not be enough. In that case you can either substract a constant from the timestamp (which will buy you some time) or you can estimate when duplicate timestamps will occur and if that date is past your retirement date ;-)