Why is the kdb+ epoch date 2000.01.01? - kdb

I am new to kdb+ and I was wondering why the epoch date of 2000.01.01 for kdb is different to that of unix (1970.01.01).
Does the difference affect any interactions with the operating system or other languages?

KDB+ uses a different epoch because it follows a different standard.
KDB+ follows the J2000 international standard, which is based on the Julian year.
UNIX uses POSIX time, which was initially a 32 bit unsigned integer. Because the time was calculated to 60ths of a second, the 32 bit integer would only work for about 829 days, so a recent date had to be chosen.
The first edition Unix Programmer's Manual dated November 3, 1971 defines the Unix time as "the time since 00:00:00, Jan. 1, 1971, measured in sixtieths of a second"
This difference can cause issues if you don't make sure to convert to one standard before piping the chosen epoch time into applications.

Issues regarding interactions with system/other languages should be able to be dealt with by the fact KDB can parse UNIX epoch timestamps
from http://code.kx.com/q/ref/casting/#tok
Parsing Unix timestamps (from seconds since Unix epoch), string with
9…11 digits:
q)"P"$"10129708800"
2290.12.31D00:00:00.000000000
q)"P"$"00000000000"
1970.01.01D00:00:00.000000000

Kdb+ is available on many different operating systems, currently Windows, Linux-x86, Linux-ARM and OSX are available to download from kx, with solaris previous available.
From the page about system time on Wikipedia we can see that various operating systems use differing epoch dates and ranges. Considering two OS that kdb+ supports we can see that they have differing epoch ranges:
OS Epoch or range
-------------------------------------------------
Unix/Posix 1 January 1970 to 19 January 2038
Windows 1 January 1601 to AD 30,828
Using either the linux or windows epoch would mean that the other did not match up anyway. Further reading on that page also shows that many other languages also use their own distinct epoch dates and ranges.
In short, there is no reason a language needs to use the epoch time of the OS it is running on.

Related

What do you call the number of *days* since the unix epoch?

I initially learned that Unix time is the number of seconds that have elapsed since 00:00:00 (UTC) on 1 January 1970. With 24 hours in a day, that means that the unix timestamp grows by 86400 every day.
Then I heard about the concept of leap seconds, and thought that would mean that maybe on some days, the unix timestamp would grow by 86401 seconds in a day, but apparently this is not the case. From what I've read, every day is treated as if it contains exactly 86400 seconds. When you get a leap second, the operating system will 'fudge' it in some way to make sure there's still 86400 timestamps - either make every 'second' that day a little bit longer than a real SI second, or they'll report the same integer timestamp twice in a row.
So I think that this means that every date since 1 Jan 1970 can be mapped to a unique integer which is the timestamp at 00:00:00 (UTC) that day divided by 86400. (guaranteed to be an integer with no remainder because as discussed every day has to have 86400 timestamps). Alternatively you could take any timestamp during that day and calculate floor(timestamp / 86400).
For example, today, Fri 23rd April 2021 - timestamp at 00:00:00 UTC was 1619136000.
As expected, this is a multiple of 86400, and 1619136000 / 86400 = 18740.
There have been 18740 days since the unix epoch.
So my question is:
Does this integer already have a well-known name? Is it already widely used in software for representing dates? I've not been able to find any reference online to this concept.
Is my logic here correct - is there really a unique integer for each date, and you can easily calculate it in your code as timestamp_at_midnight_utc / 86400? Or is there some subtle problem that I've overlooked.
My motivation here is that I often have to do complicated calculations involving lots of dates without any time information (I work for a vacation rentals company where each unit has it's own availability calendar). I think I could make a lot of efficiency improvements in my code if I was working with integers uniquely representing a date, instead of DateTime objects, or strings like '2021-04-23'.
Yes, your logic is correct. Where I still get worried is that it requires you to do your calculations in UTC. Holiday rentals happen in a time zone, and associating a date in that time zone with the start of the day in UTC instead could get confusing soon.
And yes, the concept of a count of days since 1970-01-01 is sometimes used, though not often that I have seen.
In the Java documentation the terms “epoch day” and “epoch day count” are used, but this doesn’t make these terms a standard.
I think that the first avenue for you to consider is whether either your programming language comes with a library for counting days without the need to convert to and from seconds, or there is a trustworthy third-party library that you may use for the purpose.
This Java snippet confirms your calculation:
// A LocalDate in Java is a date without time zone or UTC offset
LocalDate date = LocalDate.of(2021, Month.APRIL, 23);
long epochDayCount = date.toEpochDay();
System.out.println("Epoch day: " + epochDayCount);
Output agrees with the result you got:
Epoch day: 18740
Link: Epoch day count in the Java documentation.
From my experience there is no official name for "days since epoch". Some nuances that can be detected about UNIX time (and its measurement units):
It appears to be (relatively) officially defined as the number of seconds since the UNIX epoch.
The main purpose of the UNIX time mechanism (regardless of measurement unit conventions) is to define a point in time.
In the context of point #2, in practice, it has already become traditional that the UNIX timestamp is often returned in milliseconds.
There are several factors that can influence the measurement unit that is available to you:
design decisions by APIs, libraries and programming languages
time resolution / clock frequency of the software & hardware that you are running on - e.g. some circuits, controllers or other entities aren't able to reach millisecond resolution or they don't have enough bits available in memory to represent big numbers.
performance reasons - offering a time service at millisecond or second resolution via HTTP might prove too much for networks / server CPUs. The next best thing would be a UNIX timestamp in minutes. This value can then be cached by intermediary caches for the duration of 1 minute.
use cases - there are epochs (e.g. in astronomy) where the day is the main measurement unit.
Here are a few examples of such day-based epochs:
The Julian Day system - which has a non-integer Julian Date (JD) but an integer Julian Day Number (JDN). Its epoch is at noon 24 November 4714 BC.
J2000 epoch - measured via Julian Date as well. Its epoch is January 1, 2000, 11:58:55.816 UTC.
If you have a look at one method of calculating the Julian Date, dividing by 86400 is an important step. So, given that the JD system seems to be widely used in astronomy, I think it would be safe to consider this division by 86400 as valid :)
This is a more complex question than you might initially realize. You want the days since 1970 to be the same for all times during the local day, and you also don't want daylight saving time changes in the local time zone and UTC date changes to affect the output.
The solution I found was to compute the seconds since 1970 in UTC but for the current local date at midnight, not the current UTC date. Here is a Linux shell script solution:
echo $(( $(date -u -d "$(date '+%Y-%m-%d') 00:00:00" '+%s') / 24 / 60 / 60 ))
date -u forces UTC time, while the second date returns the local year-month-day. This computation actually generates an integer result, even if you use a computation that supports non-integers. Computing the seconds since 1970 in local time, or using the current UTC date (and no the local day) will not work.

Unix timestamp: everywhere the same?

If I request some Unix timestamps at the same time, in any system, programming language, anywhere on the world (on universe), will they always be the same? Or is it possible that values differ?
As a precondition I assume that each system has to have their time configured correctly. Additional question: nowadays, can I assume devices with an internet connection have the correct time?
So, how reliable is the usage of the Unix timestamp? E.g. if I'd like so set an alert for different users on the world at a certain time and I broadcast just the timestamp, can I assume that the alerts happen in the same second?
(Journeys with speed of light should be disregarded here, I guess.)
Unix timestamps are the number of seconds elapsed since 01-01-1970 00:00:00 UTC so if the system time is set correctly it should be equal everywhere.

PostgreSQL support for timestamps to nanosecond resolution

The data I'm receiving has timestamps down to nanoseconds (which we actually care about). Is there a way for Postgres timestamps to go to nanoseconds?
As others have pointed out, Postgres doesn't provide such type out of the box. However, it's relatively simple to create an extension that supports nanosecond resolution due to the open-source nature of Postgres. I faced similar issues a while ago and created this timestamp9 extension for Postgres.
It internally stores the timestamp as a bigint and defines it as the number of nanoseconds since the UNIX epoch. It provides some convenience functions around it that make it easy to view and manipulate the timestamps. If you can live with the limited time range that these timestamps can have, between the year 1970 and the year 2262, then this is a good solution.
Disclaimer: I'm the author of the extension
Nope, but you could trim timestamps to milliseconds, and store nanosecond part to a separate column.
You can create index on both, and view or function to return your wanted nanosecond timestamp, and you can even create index on your function.
On the one hand, the documented resolution is 1 microsecond.
On the other hand, PostgreSQL is open source. Maybe you can hack together something to support nanoseconds.

Why perl cannot convert unix timestamp over year 2038?

I am trying to use below perl command to convert epoch times to readable localtime:
bash-3.2$ perl -le print\ scalar\ localtime\ 32503651200
Thu Mar 9 19:13:52 1911
Below year 2038 is possible to be converted correctly, but for year numbers is greater than 2038 I couldn't get expected result.
Please advise how to fix. Thanks.
The year 2038 bug on 32 bit systems was worked around in Perl 5.12.0 (64 bit systems are unaffected by the 2038 bug). I know because I did it (with help). :) Simply upgrade your Perl and the problem (and a lot of others) is solved.
Alternatively, use a date library such as DateTime. It does not rely on system time functions (the root of the 2038 bug), is unaffected by the y2038 bug, and is generally much, much easier to use.
If you can't upgrade Perl and must use localtime and gmtime, you can use Time::y2038 to get versions of those functions unaffected by the 2038 bug.
Year 2038 problem
The Year 2038 problem is an issue for computing and data storage situations in which time values are stored or calculated as a signed 32-bit integer, and this number is interpreted as the number of seconds since 00:00:00 UTC on 1 January 1970 ("the epoch").1 Such implementations cannot encode times after 03:14:07 UTC on 19 January 2038, a problem similar to but not entirely analogous to the "Y2K problem" (also known as the "Millennium Bug"), in which 2-digit values representing the number of years since 1900 could not encode the year 2000 or later. Most 32-bit Unix-like systems store and manipulate time in this "Unix time" format

What Does Y2K Compliant Mean?

I was reading basics of Perl programming language and I came across the following statement
Perl is Y2K compliant.
Did not quite get what it meant even after some Googling. Is it some kind of standard established. if yes then by whom? Any info is appreciated.
For those who were programming in the late 1990s, Y2K was of crucial importance. Literally: Y2K = Year 2000.
Software that was not Y2K-compliant included, most obviously, software that stored year numbers as 2 digits (often to save storage space), and would therefore have equated the year 2000 to the year 1900. However some software products, for other reasons, were not Y2K compliant because they made incorrect date calculations for dates in the 21st and subsequent centuries.
In the latter category, I had one product that I was maintaining at the time that I had to fix because it didn't recognise the year 2000 as a leap year. As that software ran an automatic control system in a manufacturing plant, it would have damaged some expensive components if it hadn't been fixed before the end of February 2000.
There were some apocalyptic forecasts that very bad things would happen on 1 January 2000 because of software failures due to Y2K non-compliance and a lot of people were "holding their breath" at midnight on 31 December 1999 for that reason. After the fact, many people claimed the forecasts had been exaggerated. In my opinion, there were few problems because a lot of programmers worked very hard and long hours in the late-1990s specifically to deal with the threat of Y2K problems, and they would not have done so if there had not been legitimate concerns about potentially very bad outcomes.
The wikipedia article on Y2k, Year 2000 problem, explains this quite good:
In 1997, The British Standards Institute (BSI) developed a standard,
DISC PD2000-1, which defines "Year 2000 Conformity requirements" as
four rules:
No valid date will cause any interruption in operations.
Calculation of durations between, or the sequence of, pairs of dates will be correct whether any dates are in different centuries.
In all interfaces and in all storage, the century must be unambiguous, either specified, or calculable by algorithm
Year 2000 must be recognized as a leap year
Perl being Y2k compliant means that its built-in date handling follows these rules.