What's a good data structure for periodic or recurring dates? - date

Is there a published data structure for storing periodic or recurring dates? Something that can handle:
The pump need recycling every five days.
Payday is every second Friday.
Thanksgiving Day is the second Monday in October (US: the fourth Thursday in November).
Valentine's Day is every February 14th.
Solstice is (usually) every June 21st and December 21st.
Easter is the Sunday after the first full moon on or after the day of the vernal equinox (okay, this one's a bit of a stretch).
I reckon cron's internal data structure can handle #1, #4, #5 (two rules), and maybe #2, but I haven't had a look at it. MS Outlook and other calendars seem to be able to handle the first five, but I don't have that source code lying around.

Use a iCalendar implementation library, like these ones: ruby, java, php, python, .net and java, and then add support for calculating special dates.

With all these variations in the way you specify the recurrence, I would shy away from one single data structure implementation to accommodate all 5 scenarios.
Instead, I would (and have for a previous project) build simple structures that address each type of recurrence. You could wrap them all up so that it feels like a single data structure, but under the hood they could do whatever they like. By implementing an interface, I was able to treat each type of recurrence similarly so it felt like a one-size-fits-all data structure. I could ask any instance for all the dates of recurrence within a certain time frame, and that did the trick.
I'd also want to know more about how these dates need to be used before settling on a specific implementation.

If you want to hands-on create a data structure, I'd recommend a hash table (where the holidays or event are keys with the new date occurrence as a value), if there are multiplicities of each occurrence you could hash the value that finds a section in a linked list, which then has a list of all the occurrences (this would make finding as well as insertion run in O(1)).

Related

How to properly list time zones within a form?

I am creating a form (using Typeform, but doesn't really matter) in which I need to understand the time zone of my customers. So far I have left this as an open questions, but I would like to provide an extensive list, to ensure consistency on my database.
I could provide a list of time zones abbreviation (such as this one), but then I'd have the problem of daylight saving times. That is, let's say a customer is in the UK, they will see both "Greenwich Mean Time" and "British Summer Time" on the list, and would be answering differently depending on the time of the year.
How can a produce a meaningful non-redundant plaintext list of timezones?

The proper way to represent ranges in semantic URLs

Working on a little side project, I have now the opportunity to design my very own API. Event if it is not a business endeavor, it's the occasion for me to learn more about REST, Resources, Collections and URIs.
My service, records data points organized in time-series and will soon provide an API to easily query ranges of data points from specific series. Data points are immutable and as such should be very good candidates for caching. Time-series can be updated only during a limited time window, after which they are archived and readable only (making them also "cachable").
I have been looking into the APIs of some companies that provide the same kind of services, and I found the following two patterns:
Define the series in the path and the range in the query:
/series/:id?from=2017-01-26&to=2017-01-27
This is pretty much what most services out there are using. I understand it as
the series being the resources/collections that are then sliced to a specific range. This seems to be very easy to use from a consumer point of view, but from a data point of view, the dates in the query are part of some kind of organization or hierarchy and should in this case be part of the path.
Define the series and coordinates in the path:
/series/:x/:y/:z
I didn't find examples of this for time-series, but it is the kind of structure used for tile based map services. Which, to me, means that each combination of x, y and z is a different collection, that might, in some cases contain the same resources or not. It also maps directly to some hierarchy, /series/:x contains all the series with a specific value of x and any value of y and z.
I really like the idea of the method 2. and I started with something like:
/series/:id (all data points from a specific series)
/series/:id/:year (all the data points from a specific series and year)
/series/:id/:year/:month
/series/:id/:year/:month/:day
...
Which works pretty well for querying predefined ranges such as "all the data points from 2016" or "all the data points from January 2016". Issues arise when trying to query arbitrary ranges like "all the data points from January 2016 to March 2016".
My first trial was to simply add the start and end to the path:
/series/:id/:year (all the data points from a specific year)
/series/:id/:fromyear/:toyear (all the data points between fromyear and toyear)
But:
It becomes very long, very quick. Example: /series/:id/:fromyear/:frommonth/:fromday/:toyear/:tomonth/:today and potentially very cumbersome depending of the chosen structure /series/:id/:fromyear/:toyear/:frommonth/:tomonth/:fromday/:today
It doesn't make any sense from a hierarchy or structure point of view. In /series/1/2014/2016, 2016 is not a subset of 2014 and this collection is actually going to return data points from 2014, 2015 and 2016.
It is tricky to handle on the server side. Is /series/1/2016/01/02 supposed to return all the data points for the January the 2nd or for the whole January to February range ?
After noticing the way that Github references specific lines or ranges of lines in their fragment, I played with the idea of defining ranges as being different collections, such as:
/series/:id/:year/:month (same than before)
/series/:id/:year/:frommonth-:tomonth (to get a specific range)
/series/:id/:year/-:tomonth (to get everything from the beginning of the year to tomonth)
/series/:id/:year/:frommonth- (to get everything from frommonth to the end of the year)
Now, the questions:
Does my solution break any REST or Semantic URL rules/notions/ideas?
Does it improve caching in anyway compared to using ranges in the query?
Does it hurt usability for consumers?
Is it unnatural or going against unwritten rules of some Frontend frameworks?

drools working with dates

In official documentation I can't find any information how to write conditional statements for java.util.Date type fact fields in guided rules. For example how to compare such field to current date, check if it is equal omitting time, or check if it is date before some time from now?
Drools isn't a real-time program and it doesn't have an innate idea of Time or Now. If you need to investigate relations of some fact property w.r.t. some point of time X, you'll have to establish a fact carrying X as its data, and write your rules based on that.
A more or less coarse approximation of a fact representing Now can be made using timers. You can implement a rule that modifies a fact containing a value representing Time (e.g. java.util.Date) every second, or less frequently.
Blending out the time of the day is something you'll have to do using Java or DRL functions. Alternatively, if it is days you are interested in, use some custom class representing days, with some suitable day 1 defined by you.
you can give like
inputDate>=11-Nov-2014
provide your current date to inputDate rule input Fact variable.

NLP Date Parsing

I've been experimenting with a number of NLP text parsers, but have found that most fail at even some of the simplest tasks that occur in actual texts (aren't preprocessed to show how "great" the systems are. An example is the following:
From Sundays until Thursdays every week
I've yet to find a single parser that can parse this correctly. I've tried with quite a number including Stanford's sutime. Can anyone recommend software that can handle natural text dates?
I did not find one either when I went looking so I wrote my own. It's part of my natural language engine for .NET.
Here's what the demo shows when you enter that phrase (qualified to next week rather than every week - it can handle that too but it's infinite):
Some comments:
1) Handling all possible english language temporal expressions is a huge task. I've been working on this problem for years to come up with a clean way to represent temporal expressions plus the many rules needed to parse english expressions of time.
2) In addition to finding a way to represent typical calendar date times and ranges of such, you also need ways to represent infinite sequences like 'every monday', and half-infinite sequences like 'every weekday before ...'. And then you'll need an algebra on top of that for combining temporal expressions.
3) Temporal expressions are often ambiguous in the English language and interpretation may vary from culture to culture.
4) The result must often be interpreted in the context of the sentence and/or the conversation history. "Who called Monday?" is a different Monday from "Remind me on Monday" and is different again from "Show me statistics for Monday".

Temporal Extraction (i.e. Extract date/time entities from free form text) - How?

Has anyone found a simple, but effective way to extract date references from text? I've done a fair amount of searching for temporal extraction tools, but there isn't a lot out there. There are a few white papers, but it seems to fall into a subset of the whole semantic web thingy but not given much attention.
I'm just looking for something that is 80% effective. There is no need to capture things like "the month after Jan 2009", but basic common dates entities would be nice.
I'm open to all suggestions, even fancy regex expressions.
Fire away!
(and thanks - Henry)
If the target temporal expressions in your data are only in limited format, use regular expression and iterative approach to refine your system
Otherwise, use Stanford NLP toolkit, SUTime, which might be an over-kill but definitely meet your demands
One way I have done this is to just look for anything that is 4 numbers and convert it to a number. If the number falls within the range of years you are interested in, you probably have a year you can use. If you are interested in any matching months and days you could check adjacent words to see if they are a month name or a number between 1 and 31. I am confident this would satisfy your 80% requirement.
Regex for years: [0-9]{4} - you will need to convert to a number and see if it's within the range of years you consider valid.
Regex for months: jan|january|feb|february ... etc for each month
Regex for days of the month: [0-9]{1,2} - you would need to convert to a number and see if it is 1-31
I'm drawing a blank on how to find what to feed it, but this library will parse a wide range of dates and could be used as the "is this a real date" function. (Full disclosure, I'm the author of that lib)