I'm looking for an overview of the internal data representation and earliest/latest dates supported by typical time libraries in different programming languages.
I can remember reading a webpage about that a while back but can't find it any more after dozens of Google search term refinements.
I didn't see the code of all major libs to be sure about that, but I guess many of them store data as fields, with most of them being numbers, such as year, month, day, hour, minute, etc.
The limits are either the upper/lower bounds of the respective numerical types, or some artificial value (such as "year 1 million" to represent a "very far future").
I think it also depends on the types being represented:
If the type represents only a local date (day/month/year, without hours and no timezone/offset), then the limits should be the upper/lower bounds of the type used to store the year value, or some artificial value.
If the type represents a timestamp (number of seconds/milliseconds/whatever-precision-of-seconds-the-API-supports since unix epoch), the limit can be the maximum value of the type used to store this value, or some artificial limit value.
And so on. But each of the cases above might give you a different limit, and the consistency between limits from different types will depend on the API.
Is that what you're asking?
Some API's, such as in Java, has documented limits for each type:
https://docs.oracle.com/javase/8/docs/api/java/time/LocalDateTime.html#MIN
https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html#MIN
And note how the limit is different for each type.
Related
I am facing a problem regarding the ability to enforce different resolutions of expressing time for different rdfs:Classes. I have a graph where:
:event a rdfs:Class.
:subevent rdfs:subClassOf :event.
and also related SHACL-rules where the event class requires its temporal existence reported only at the resolution of date, whereas the subevent is a more precisely defined point in time:
:eventSH a sh:NodeShape;
sh:targetClass :event;
sh:property [
sh:path :happeningOn;
sh:datatype xsd:date;
sh:minCount 1;
sh:maxCount 1;
].
:subeventSH a sh:NodeShape;
sh:targetClass :subevent;
sh:property [
sh:path :happeningOn;
sh:datatype xsd:dateTime;
sh:minCount 1;
sh:maxCount 1;
].
So, in an ontological sense, I have the need to express events at a varying resolution (some events are only known to occur on a certain year, some e.g. on a certain date, and some events are known to happen on a precise point in time).
In essence, the question is: is SHACL capable of expressing a constraint where the subevent timepoint must fall inside the superclass date? Is the only possibility to use SHACL-SPARQL for this? I understand that by nature year, month, day, date are different beasts compared to dateTime, as they are not points but rather ranges between two points in time.
I can't seem to find a function to convert dateTime to date, perhaps just casting into xsd:date would do it but not sure whether this is something most engines support in an unified way. So my primary question is - is this requirement of different resolutions for the same inherited predicate achievable in pure SHACL itself? Or should I resort to using different predicates with the help of e.g. OWL Time ontology? This would seem like an unnecessary complication compared to just using pure SHACL.
edit: As a clarification, I do recognize that in its current shape there is no possibility to define a subevent, as the shapes that restrict it are contradictory.
For this scenario you cannot use sh:datatype. Subclasses can only narrow down the constraints from superclasses. So if the superclass allows xsd:date then the subclass cannot constraint it further to xsd:dateTime. While it may sound intuitive to expect dateTimes to be a "subset" of dates, this is not how SHACL works, because it will compare the exact datatypes only, i.e. the URI of the datatype must match.
I also believe it would be very unusual to have a property that is either xsd:date or xsd:dateTime, depending on context. This makes it harder for applications to process. For example imagine an algorithm that is working against event and doesn't know about sub-event. Such an algorithm would be best if it could always assume xsd:date literals. One design alternative would be to define two properties, where the xsd:date property is always present (even for instances of the subclass), while the subclass may have another property to represent more details.
BTW to convert from xsd:dateTime to xsd:date, you can use xsd:date as a SPARQL function: BIND (xsd:date(NOW()) AS ?date)
Discussing data time-formats, someone mentioned to me how he stores datetime (in a human-readable format) using floats as yyyymmdd.hhmmss, so 2019-09-18, 11:29:30am would become 20190918.112930
I'm trying to find out if this guy has invented his own format or if it is used (and described) elsewhere too - and if so, how is it even called...?
It’s probably homespun
I have seen a lot of date and time formats, and I have not seen this one before. My go is that his guy or his organization invented it themselves.
Edit: Thank you for confirming in the comment. Since comments are not always permanent on Stack Overflow, I quote here, you said:
Finally got confirmation from the source: it's homespun indeed.
As an aside I don’t like it. A float is stored in a binary format internally, and only after formatting it into decimal does it become human readable. Using a float for a “human readable” date and time was not what formatting of floating-point numbers was meant for, it’s a hack.
Use ISO 8601
For a human-readable format I recommend ISO 8601. Here 2019-09-18, 11:29:30am becomes 2019-09-18T11:29:30. Or even better and still within ISO 8601, convert to UTC and append a Z to denote UTC. So if your original time was in Europe/Berlin time zone, it would become 2019-09-18T09:29:30Z. As you can see, ISO 8601 is even more human readable than you friend’s format, and it is sortable as strings (as long as the years don’t go beyond 9999).
While he may have come up with it himself, it is also a formatting option in zipinfo.
The manual doesn't explicitly name it, but describes it as a sortable decimal format and decimal format.
Not sure if we are talking about SQL date format. If so, this date format is present in SQL Statements.
Not sure about the name, it's called in different ways: non-standard, ISO, Other format and so on.
Is present also in PHP.
According to Wikipedia, this would be similar to the ISO 8601, which permits, all of the following for date and time combined:
2019-09-18T09:18:26+00:00
2019-09-18T09:18:26Z
20190918T091826Z
except that the T to separate the time from the date is replaced by . and the time-zone information is dropped.
That specific format has limited popularity either in the yyyymmdd.HHMMSS or the C's strftime()-compatible %Y%m%d.%H%M%S form.
EDIT
As far as using float for date and time the way you suggests, it depends on the precision and machine representation.
If the system is following IEEE 754 basic standard (which is what most modern C compiler stick to), you would need at least float64.
However, it is not common to do so.
This might be in part because it may be difficult to correctly predict the accuracy of the time information, and it is not as bit-efficient as the Unix time.
Given that the only positive feature it has is that it can rely on standard %f from sprintf(), I would only see it advantageous when strftime() is not available or a performance bottleneck.
Working on a little side project, I have now the opportunity to design my very own API. Event if it is not a business endeavor, it's the occasion for me to learn more about REST, Resources, Collections and URIs.
My service, records data points organized in time-series and will soon provide an API to easily query ranges of data points from specific series. Data points are immutable and as such should be very good candidates for caching. Time-series can be updated only during a limited time window, after which they are archived and readable only (making them also "cachable").
I have been looking into the APIs of some companies that provide the same kind of services, and I found the following two patterns:
Define the series in the path and the range in the query:
/series/:id?from=2017-01-26&to=2017-01-27
This is pretty much what most services out there are using. I understand it as
the series being the resources/collections that are then sliced to a specific range. This seems to be very easy to use from a consumer point of view, but from a data point of view, the dates in the query are part of some kind of organization or hierarchy and should in this case be part of the path.
Define the series and coordinates in the path:
/series/:x/:y/:z
I didn't find examples of this for time-series, but it is the kind of structure used for tile based map services. Which, to me, means that each combination of x, y and z is a different collection, that might, in some cases contain the same resources or not. It also maps directly to some hierarchy, /series/:x contains all the series with a specific value of x and any value of y and z.
I really like the idea of the method 2. and I started with something like:
/series/:id (all data points from a specific series)
/series/:id/:year (all the data points from a specific series and year)
/series/:id/:year/:month
/series/:id/:year/:month/:day
...
Which works pretty well for querying predefined ranges such as "all the data points from 2016" or "all the data points from January 2016". Issues arise when trying to query arbitrary ranges like "all the data points from January 2016 to March 2016".
My first trial was to simply add the start and end to the path:
/series/:id/:year (all the data points from a specific year)
/series/:id/:fromyear/:toyear (all the data points between fromyear and toyear)
But:
It becomes very long, very quick. Example: /series/:id/:fromyear/:frommonth/:fromday/:toyear/:tomonth/:today and potentially very cumbersome depending of the chosen structure /series/:id/:fromyear/:toyear/:frommonth/:tomonth/:fromday/:today
It doesn't make any sense from a hierarchy or structure point of view. In /series/1/2014/2016, 2016 is not a subset of 2014 and this collection is actually going to return data points from 2014, 2015 and 2016.
It is tricky to handle on the server side. Is /series/1/2016/01/02 supposed to return all the data points for the January the 2nd or for the whole January to February range ?
After noticing the way that Github references specific lines or ranges of lines in their fragment, I played with the idea of defining ranges as being different collections, such as:
/series/:id/:year/:month (same than before)
/series/:id/:year/:frommonth-:tomonth (to get a specific range)
/series/:id/:year/-:tomonth (to get everything from the beginning of the year to tomonth)
/series/:id/:year/:frommonth- (to get everything from frommonth to the end of the year)
Now, the questions:
Does my solution break any REST or Semantic URL rules/notions/ideas?
Does it improve caching in anyway compared to using ranges in the query?
Does it hurt usability for consumers?
Is it unnatural or going against unwritten rules of some Frontend frameworks?
In official documentation I can't find any information how to write conditional statements for java.util.Date type fact fields in guided rules. For example how to compare such field to current date, check if it is equal omitting time, or check if it is date before some time from now?
Drools isn't a real-time program and it doesn't have an innate idea of Time or Now. If you need to investigate relations of some fact property w.r.t. some point of time X, you'll have to establish a fact carrying X as its data, and write your rules based on that.
A more or less coarse approximation of a fact representing Now can be made using timers. You can implement a rule that modifies a fact containing a value representing Time (e.g. java.util.Date) every second, or less frequently.
Blending out the time of the day is something you'll have to do using Java or DRL functions. Alternatively, if it is days you are interested in, use some custom class representing days, with some suitable day 1 defined by you.
you can give like
inputDate>=11-Nov-2014
provide your current date to inputDate rule input Fact variable.
I've been experimenting with a number of NLP text parsers, but have found that most fail at even some of the simplest tasks that occur in actual texts (aren't preprocessed to show how "great" the systems are. An example is the following:
From Sundays until Thursdays every week
I've yet to find a single parser that can parse this correctly. I've tried with quite a number including Stanford's sutime. Can anyone recommend software that can handle natural text dates?
I did not find one either when I went looking so I wrote my own. It's part of my natural language engine for .NET.
Here's what the demo shows when you enter that phrase (qualified to next week rather than every week - it can handle that too but it's infinite):
Some comments:
1) Handling all possible english language temporal expressions is a huge task. I've been working on this problem for years to come up with a clean way to represent temporal expressions plus the many rules needed to parse english expressions of time.
2) In addition to finding a way to represent typical calendar date times and ranges of such, you also need ways to represent infinite sequences like 'every monday', and half-infinite sequences like 'every weekday before ...'. And then you'll need an algebra on top of that for combining temporal expressions.
3) Temporal expressions are often ambiguous in the English language and interpretation may vary from culture to culture.
4) The result must often be interpreted in the context of the sentence and/or the conversation history. "Who called Monday?" is a different Monday from "Remind me on Monday" and is different again from "Show me statistics for Monday".