Why perl cannot convert unix timestamp over year 2038? - perl

I am trying to use below perl command to convert epoch times to readable localtime:
bash-3.2$ perl -le print\ scalar\ localtime\ 32503651200
Thu Mar 9 19:13:52 1911
Below year 2038 is possible to be converted correctly, but for year numbers is greater than 2038 I couldn't get expected result.
Please advise how to fix. Thanks.

The year 2038 bug on 32 bit systems was worked around in Perl 5.12.0 (64 bit systems are unaffected by the 2038 bug). I know because I did it (with help). :) Simply upgrade your Perl and the problem (and a lot of others) is solved.
Alternatively, use a date library such as DateTime. It does not rely on system time functions (the root of the 2038 bug), is unaffected by the y2038 bug, and is generally much, much easier to use.
If you can't upgrade Perl and must use localtime and gmtime, you can use Time::y2038 to get versions of those functions unaffected by the 2038 bug.

Year 2038 problem
The Year 2038 problem is an issue for computing and data storage situations in which time values are stored or calculated as a signed 32-bit integer, and this number is interpreted as the number of seconds since 00:00:00 UTC on 1 January 1970 ("the epoch").1 Such implementations cannot encode times after 03:14:07 UTC on 19 January 2038, a problem similar to but not entirely analogous to the "Y2K problem" (also known as the "Millennium Bug"), in which 2-digit values representing the number of years since 1900 could not encode the year 2000 or later. Most 32-bit Unix-like systems store and manipulate time in this "Unix time" format

Related

What do you call the number of *days* since the unix epoch?

I initially learned that Unix time is the number of seconds that have elapsed since 00:00:00 (UTC) on 1 January 1970. With 24 hours in a day, that means that the unix timestamp grows by 86400 every day.
Then I heard about the concept of leap seconds, and thought that would mean that maybe on some days, the unix timestamp would grow by 86401 seconds in a day, but apparently this is not the case. From what I've read, every day is treated as if it contains exactly 86400 seconds. When you get a leap second, the operating system will 'fudge' it in some way to make sure there's still 86400 timestamps - either make every 'second' that day a little bit longer than a real SI second, or they'll report the same integer timestamp twice in a row.
So I think that this means that every date since 1 Jan 1970 can be mapped to a unique integer which is the timestamp at 00:00:00 (UTC) that day divided by 86400. (guaranteed to be an integer with no remainder because as discussed every day has to have 86400 timestamps). Alternatively you could take any timestamp during that day and calculate floor(timestamp / 86400).
For example, today, Fri 23rd April 2021 - timestamp at 00:00:00 UTC was 1619136000.
As expected, this is a multiple of 86400, and 1619136000 / 86400 = 18740.
There have been 18740 days since the unix epoch.
So my question is:
Does this integer already have a well-known name? Is it already widely used in software for representing dates? I've not been able to find any reference online to this concept.
Is my logic here correct - is there really a unique integer for each date, and you can easily calculate it in your code as timestamp_at_midnight_utc / 86400? Or is there some subtle problem that I've overlooked.
My motivation here is that I often have to do complicated calculations involving lots of dates without any time information (I work for a vacation rentals company where each unit has it's own availability calendar). I think I could make a lot of efficiency improvements in my code if I was working with integers uniquely representing a date, instead of DateTime objects, or strings like '2021-04-23'.
Yes, your logic is correct. Where I still get worried is that it requires you to do your calculations in UTC. Holiday rentals happen in a time zone, and associating a date in that time zone with the start of the day in UTC instead could get confusing soon.
And yes, the concept of a count of days since 1970-01-01 is sometimes used, though not often that I have seen.
In the Java documentation the terms “epoch day” and “epoch day count” are used, but this doesn’t make these terms a standard.
I think that the first avenue for you to consider is whether either your programming language comes with a library for counting days without the need to convert to and from seconds, or there is a trustworthy third-party library that you may use for the purpose.
This Java snippet confirms your calculation:
// A LocalDate in Java is a date without time zone or UTC offset
LocalDate date = LocalDate.of(2021, Month.APRIL, 23);
long epochDayCount = date.toEpochDay();
System.out.println("Epoch day: " + epochDayCount);
Output agrees with the result you got:
Epoch day: 18740
Link: Epoch day count in the Java documentation.
From my experience there is no official name for "days since epoch". Some nuances that can be detected about UNIX time (and its measurement units):
It appears to be (relatively) officially defined as the number of seconds since the UNIX epoch.
The main purpose of the UNIX time mechanism (regardless of measurement unit conventions) is to define a point in time.
In the context of point #2, in practice, it has already become traditional that the UNIX timestamp is often returned in milliseconds.
There are several factors that can influence the measurement unit that is available to you:
design decisions by APIs, libraries and programming languages
time resolution / clock frequency of the software & hardware that you are running on - e.g. some circuits, controllers or other entities aren't able to reach millisecond resolution or they don't have enough bits available in memory to represent big numbers.
performance reasons - offering a time service at millisecond or second resolution via HTTP might prove too much for networks / server CPUs. The next best thing would be a UNIX timestamp in minutes. This value can then be cached by intermediary caches for the duration of 1 minute.
use cases - there are epochs (e.g. in astronomy) where the day is the main measurement unit.
Here are a few examples of such day-based epochs:
The Julian Day system - which has a non-integer Julian Date (JD) but an integer Julian Day Number (JDN). Its epoch is at noon 24 November 4714 BC.
J2000 epoch - measured via Julian Date as well. Its epoch is January 1, 2000, 11:58:55.816 UTC.
If you have a look at one method of calculating the Julian Date, dividing by 86400 is an important step. So, given that the JD system seems to be widely used in astronomy, I think it would be safe to consider this division by 86400 as valid :)
This is a more complex question than you might initially realize. You want the days since 1970 to be the same for all times during the local day, and you also don't want daylight saving time changes in the local time zone and UTC date changes to affect the output.
The solution I found was to compute the seconds since 1970 in UTC but for the current local date at midnight, not the current UTC date. Here is a Linux shell script solution:
echo $(( $(date -u -d "$(date '+%Y-%m-%d') 00:00:00" '+%s') / 24 / 60 / 60 ))
date -u forces UTC time, while the second date returns the local year-month-day. This computation actually generates an integer result, even if you use a computation that supports non-integers. Computing the seconds since 1970 in local time, or using the current UTC date (and no the local day) will not work.

Date delta human readable format

I looking for a way to output how old something is for some reporting. I'm pretty open to approach, libraries, packages, and languages. Though bash, golang, and python would be preferred. I have already researched many different ways to do this. Though I keep seeing the same results which is not my ideal output. The outputs I am seeing are something like 45 days ago, or 1.06 months ago.
I'm looking for a way to output hours, days, weeks, months, and years. And example of this output would be "Last ran: 1 month, 2 weeks, 4 days, and 9 hours ago". Right now I'm currently getting the dates as epoch timestamps from bash date tool. So I can do any kind of conversion necessary to input. For sake of example here are some inputs: 2020-03-19T05:12:14, 2019-08-27T08:47:27, 2020-05-12T11:10:18, 2020-02-01T07:40:01. I'm trying to get how old these time stamps are from current time/date. So essentially what date outputs. Or other language equivalent.
More examples to demonstrate what I don't want echo $(( ($(date +%s) - $(date -d "2020-08-27T05:47:27" +%s) )/(60*60*24) )) => 16 Instead I want 2 weeks and 2 days. Same for something over a month or year old. Hours is the smallest increment that I really care about.

Why is the kdb+ epoch date 2000.01.01?

I am new to kdb+ and I was wondering why the epoch date of 2000.01.01 for kdb is different to that of unix (1970.01.01).
Does the difference affect any interactions with the operating system or other languages?
KDB+ uses a different epoch because it follows a different standard.
KDB+ follows the J2000 international standard, which is based on the Julian year.
UNIX uses POSIX time, which was initially a 32 bit unsigned integer. Because the time was calculated to 60ths of a second, the 32 bit integer would only work for about 829 days, so a recent date had to be chosen.
The first edition Unix Programmer's Manual dated November 3, 1971 defines the Unix time as "the time since 00:00:00, Jan. 1, 1971, measured in sixtieths of a second"
This difference can cause issues if you don't make sure to convert to one standard before piping the chosen epoch time into applications.
Issues regarding interactions with system/other languages should be able to be dealt with by the fact KDB can parse UNIX epoch timestamps
from http://code.kx.com/q/ref/casting/#tok
Parsing Unix timestamps (from seconds since Unix epoch), string with
9…11 digits:
q)"P"$"10129708800"
2290.12.31D00:00:00.000000000
q)"P"$"00000000000"
1970.01.01D00:00:00.000000000
Kdb+ is available on many different operating systems, currently Windows, Linux-x86, Linux-ARM and OSX are available to download from kx, with solaris previous available.
From the page about system time on Wikipedia we can see that various operating systems use differing epoch dates and ranges. Considering two OS that kdb+ supports we can see that they have differing epoch ranges:
OS Epoch or range
-------------------------------------------------
Unix/Posix 1 January 1970 to 19 January 2038
Windows 1 January 1601 to AD 30,828
Using either the linux or windows epoch would mean that the other did not match up anyway. Further reading on that page also shows that many other languages also use their own distinct epoch dates and ranges.
In short, there is no reason a language needs to use the epoch time of the OS it is running on.

What Does Y2K Compliant Mean?

I was reading basics of Perl programming language and I came across the following statement
Perl is Y2K compliant.
Did not quite get what it meant even after some Googling. Is it some kind of standard established. if yes then by whom? Any info is appreciated.
For those who were programming in the late 1990s, Y2K was of crucial importance. Literally: Y2K = Year 2000.
Software that was not Y2K-compliant included, most obviously, software that stored year numbers as 2 digits (often to save storage space), and would therefore have equated the year 2000 to the year 1900. However some software products, for other reasons, were not Y2K compliant because they made incorrect date calculations for dates in the 21st and subsequent centuries.
In the latter category, I had one product that I was maintaining at the time that I had to fix because it didn't recognise the year 2000 as a leap year. As that software ran an automatic control system in a manufacturing plant, it would have damaged some expensive components if it hadn't been fixed before the end of February 2000.
There were some apocalyptic forecasts that very bad things would happen on 1 January 2000 because of software failures due to Y2K non-compliance and a lot of people were "holding their breath" at midnight on 31 December 1999 for that reason. After the fact, many people claimed the forecasts had been exaggerated. In my opinion, there were few problems because a lot of programmers worked very hard and long hours in the late-1990s specifically to deal with the threat of Y2K problems, and they would not have done so if there had not been legitimate concerns about potentially very bad outcomes.
The wikipedia article on Y2k, Year 2000 problem, explains this quite good:
In 1997, The British Standards Institute (BSI) developed a standard,
DISC PD2000-1, which defines "Year 2000 Conformity requirements" as
four rules:
No valid date will cause any interruption in operations.
Calculation of durations between, or the sequence of, pairs of dates will be correct whether any dates are in different centuries.
In all interfaces and in all storage, the century must be unambiguous, either specified, or calculable by algorithm
Year 2000 must be recognized as a leap year
Perl being Y2k compliant means that its built-in date handling follows these rules.

What is the best way to store dates in MongoDB?

I am just starting to learn about MongoDB and hoping to slowly migrate from MySQL.
In MySQL, there are two different data types - DATE ('0000-00-00') and DATETIME ('0000-00-00 00:00:00'). In my MySQL, I use the DATE type, but I am not sure how to transfer them into MongoDB. In MongoDB, there is a Date object, which is comparable to DATETIME. It seems it would be most appropriate to use Date objects, but that would be wasting space, since hours, min, sec are not utilized. On the other hand, storing dates as strings seems wrong.
Is there a golden standard on storing dates ('0000-00-00') in MongoDB?
I'm actually in the process of converting a MongoDB database where dates are stored as proper Date() types to instead store them as strings in the form yyyy-mm-dd. Why, considering that every other answerer says that this is a horrible idea? Simply put, because of the neverending pain I've been suffering trying to work with dates in JavaScript, which has no (real) concept of timezones. I had been storing UTC dates in MongoDB, i.e. a Date() object with my desired date and the time set as midnight UTC, but it's unexpectedly complicated and error-prone to get a user-submitted date correctly converted to that from whatever timezone they happen to be in. I've been struggling to get my JavaScript "whatever local timezone to UTC" code to work (and yes, I'm aware of Sugar.js and Moment.js) and I've decided that simple strings like the good old MySQL standard yyyy-mm-dd is the way to go, and I'll parse into Date() objects as needed at runtime on the client side.
Incidentally, I'm also trying to sync this MongoDB database with a FileMaker database, which also has no concept of timezones. For me the simplicity of simply not storing time data, especially when it's meaningless like UTC midnight, helps ensure less-buggy code even if I have to parse to and from the string dates now and then.
BSON (the storage data format used by mongo natively) has a dedicated date type UTC datetime which is a 64 bit (so, 8 byte) signed integer denoting milliseconds since Unix time epoch. There are very few valid reasons why you would use any other type for storing dates and timestamps.
If you're desperate to save a few bytes per date (again, with mongo's padding and minimum block size and everything this is only worth the trouble in very rare cases) you can store dates as a 3 byte binary blob by storing it as an unsigned integer in YYYYMMDD format, or a 2 byte binary blob denoting "days since January 1st of year X" where X must be chosen appropriately since that only supports a date range spanning 179 years.
EDIT: As the discussion below demonstrates this is only a viable approach in very rare circumstances. Basically; use mongo's native date type ;)
If you really care about saving 4 bytes per field (in case you have many DATE fields per document) you can store dates as int32 fields in form 20110720 (note MySQL DATE occupies 3 bytes, so the storage will be greater in any case). Otherwise I'd better stick to standard datetime type.