Postgresql Modeling Ranges - postgresql

I'm looking to model a group of users that provider various services that take various times and hoping to build on the relatively new ranges datetypes that Postgresql supports to make things a lot cleaner.
Bookings:
user_id|integer
time|tsrange
service_id|integer
Services:
user_id|integer
time_required|integer #in hours
Users:
id|integer
Services vary between users, some might be identical but takes one user 2 hours and another just 1 hour.
Searching for bookings that occur within, or not within a given time period are easy. I'm having trouble figuring out how best I would get all the users who have time available on a given day to perform one or more of their services.
I think I need to find the inverse of their booked ranges, bound by 6am/8pmn on a given day and then see if the largest range within that set will fit their smallest service, but figuring out how express that in SQL is eluding me.
Is it possible to do inverses of ranges, or would their be a better way to approach this?

The question isn't entirely clear, but if I get it right you're look for the lead() or lag() window function to compute available time slots. If so, this should put you on the right track:
select bookings.*, lead("time") over (order by "time") as next_booking_time
from bookings
http://www.postgresql.org/docs/current/static/tutorial-window.html

Related

Do I have to loop through each 'page' of orders to get all orders in one WooComerce REST Api query?

I've built a KNIME workflow that helps me analyse (sales) data from numerous channels. In the past I used to export all orders manually and use an XSLX or CSV reader but I want to do it via WooCommerce's REST API to reduce manual labor.
I would like to be able to receive all orders up until now from a single query. So far, I only get as many as the # I fill in for &per_page=X. But if I fill in like 1000, it gives an error. This + my common sense give me the feeling I'm thinking the wrong way!
If it is not possible, is looping through all pages the second best thing?
I've managed to connect to the api via basic auth. The following query returns orders, but only 10:
I've tried increasing the number per_page but I do not think this is the right way to get all orders in one table.
https://XXXX.nl/wp-json/wc/v3/orders?consumer_key=XXXX&consumer_secret=XXXX
My current mindset would like to be able to receive all orders up until now from a single query. But it personally also feels like that this is not the common way to do it. Is looping through all pages the second best thing?
Thanks in advance for your responses. I am more of a data analist than a data engineer or scientist and I hope your answers will help me towards my goal of being more of a scientist :)
It's possible by passing params "per_page" with the request
per_page integer Maximum number of items to be returned in result set. Default is 10.
Try -1 as the value
https://woocommerce.github.io/woocommerce-rest-api-docs/?php#list-all-orders

The proper way to represent ranges in semantic URLs

Working on a little side project, I have now the opportunity to design my very own API. Event if it is not a business endeavor, it's the occasion for me to learn more about REST, Resources, Collections and URIs.
My service, records data points organized in time-series and will soon provide an API to easily query ranges of data points from specific series. Data points are immutable and as such should be very good candidates for caching. Time-series can be updated only during a limited time window, after which they are archived and readable only (making them also "cachable").
I have been looking into the APIs of some companies that provide the same kind of services, and I found the following two patterns:
Define the series in the path and the range in the query:
/series/:id?from=2017-01-26&to=2017-01-27
This is pretty much what most services out there are using. I understand it as
the series being the resources/collections that are then sliced to a specific range. This seems to be very easy to use from a consumer point of view, but from a data point of view, the dates in the query are part of some kind of organization or hierarchy and should in this case be part of the path.
Define the series and coordinates in the path:
/series/:x/:y/:z
I didn't find examples of this for time-series, but it is the kind of structure used for tile based map services. Which, to me, means that each combination of x, y and z is a different collection, that might, in some cases contain the same resources or not. It also maps directly to some hierarchy, /series/:x contains all the series with a specific value of x and any value of y and z.
I really like the idea of the method 2. and I started with something like:
/series/:id (all data points from a specific series)
/series/:id/:year (all the data points from a specific series and year)
/series/:id/:year/:month
/series/:id/:year/:month/:day
...
Which works pretty well for querying predefined ranges such as "all the data points from 2016" or "all the data points from January 2016". Issues arise when trying to query arbitrary ranges like "all the data points from January 2016 to March 2016".
My first trial was to simply add the start and end to the path:
/series/:id/:year (all the data points from a specific year)
/series/:id/:fromyear/:toyear (all the data points between fromyear and toyear)
But:
It becomes very long, very quick. Example: /series/:id/:fromyear/:frommonth/:fromday/:toyear/:tomonth/:today and potentially very cumbersome depending of the chosen structure /series/:id/:fromyear/:toyear/:frommonth/:tomonth/:fromday/:today
It doesn't make any sense from a hierarchy or structure point of view. In /series/1/2014/2016, 2016 is not a subset of 2014 and this collection is actually going to return data points from 2014, 2015 and 2016.
It is tricky to handle on the server side. Is /series/1/2016/01/02 supposed to return all the data points for the January the 2nd or for the whole January to February range ?
After noticing the way that Github references specific lines or ranges of lines in their fragment, I played with the idea of defining ranges as being different collections, such as:
/series/:id/:year/:month (same than before)
/series/:id/:year/:frommonth-:tomonth (to get a specific range)
/series/:id/:year/-:tomonth (to get everything from the beginning of the year to tomonth)
/series/:id/:year/:frommonth- (to get everything from frommonth to the end of the year)
Now, the questions:
Does my solution break any REST or Semantic URL rules/notions/ideas?
Does it improve caching in anyway compared to using ranges in the query?
Does it hurt usability for consumers?
Is it unnatural or going against unwritten rules of some Frontend frameworks?

How should I deal in correct way with recuring events and possible events' details? (Mongo,Vue)

I am struggling with the way how I should design scheme for my data.
I saw a lot of threads with recuring calendar events but I can't translate it into my case as it is slightly different and this small difference making it tough.
Basically I have two models which don't let me sleep. EVENT and EVENT_DETAILS
First of all, I am using MongoDB but this is mostly more high level abstraction than going down to DB level, but I can be wrong.
Backend is SailsJS.
On the frontend I have VueJS SPA.
I have events. Most of them are recuring forever (permanent), but some of them are either one off, so have exact date when they happen,
some of them are recuring on one day of week in given period and some of permanent are suspended for given period.
EVENT can exist without EVENT_DETAILS, but not the other way of course.
I need to let user go through WEEKS periods and be able to see list of EVENTS in given period and related EVENT_DETAILS and if there is no EVENT_DETAILS,
let him create it.
For handling permanent, suspension and temporary, I have following attributes in
my EVENT (I show only relevant atts):
TYPE: [0,1,2]; 0 - permanent, 1 - temp, 2 - one off
SUSPENDED: true/false
SUSPENSION_DATES (from,to)
PERIOD (from,to) - for temporary
DAY: [0-6] - day of the week (choosen for 0 and 1, autmatically for 2 - one off
and of course association to EVENT_DETAILS.
Is it ok or should I amend it to make it correct/better/easier to deal with ?
How should I handle criteria to give user complete list of events for particular period?
I am of course not asking for code as it is not an issue for me.
I am asking for help with logic/criteria as I am really stuck with that for few days.
A few improvements you may consider:
since your events are specified by the unit of day, events happening on one day and those last for a period can be normalized into one, by setting start and end dates to the same for one-off events.
As for a field in a schema in mongodb, not only primitive types are allowed, objects/arrays can also be used to form complicated structures. So you can just store an event's details in the event schema, with a field of array of strings, like {type: [String], default: []} in mongoose syntax, or even {type: [{label: String, description: String}], default: []}.
For the second bullet, embedding documents in mongodb is good for forming a model-like database structure, but mongodb doesn't do really well in aggregation, so only embed those "belonging" to a schema (or, those with a one-one or multi-one relation mapping to another, not multi-multi), in our case here, it should be safe to do so since details belong to single events.
So now there's just events in periods and weekly events, which can't be normalized into one since one is continous and the other is not. Having a type flag to identify them seems all right to me. You can make some comments/documents and add some configs like const EVENT_TYPES = {PERMANENT: 0, TEMP: 1} and use these readable type configs instead of 0 1s to make sure your colleages and yourself won't hate you in the future ;)

Introduction to object databases

I'm trying to understand the idea of noSQL databases, to be more precise, the concept behind neo4j graph database. I have experience with SQL databases (MySQL, MS SQL), but the limitations of managing hierarchical data made me to expand my knowledge. But now I have some questions and I can't find their answers (maybe I don't know what to search).
Imagine we have list of countries in the world. Each country has it's GDP every year. Each country has it's GDP calculated by different sources - World Bank, their government, CIA etc. What's the best way to organise data in this case?
The simplest thing which came in mind is to have the node (the values are imaginary):
China:
GDPByWorldBank2012: 999,
GDPByCIA2011: 994,
GDPByGovernment2012: 1102,
In relational database, I would split the data in three tables: Countries, Sources and Values, where in Values I would have value of GDP, year, id of the country and id of the source.
Other thing which came in mind is to create nodes CIA, World bank, but node Government looks really weird. Even though, the idea is to have relationships (valueIfGDP):
CIA -> valueOfGDP - {year: 2011, value: 994} -> China
World Bank -> valueOfGDP - {year: 2012, value: 999} -> China
This looks pretty weird for me, what is more, what happens when we add the values for all the years from one source? We would have multiple relationships or what?
I'm sorry if my questions are too dumb and I would be happy if someone explain me or show me what book/article to read.
Thanks in advance. :)
Your questions are very legit and you're not the only one having difficulties to grasp graph modelling at first ;)
It is always easier to start thinking about the questions you wanna answer with your data before modelling it up front.
Let's imagine you wanna retrieve the GDP of year 2012 computed by CIA of all countries.
A simple way to achieve this is to label country nodes uniformly, and set an attribute name that obviously depends on the country name.
Moreover, CIA/WorldBank/Government in this domain are all "sources", let's label them uniformly as well.
For instance, that could give something like:
(ORGANIZATION {name: CIA})-[:HAS_COMPUTED_GDP {year:2011, value:994}]->(COUNTRY {name:China})
With Cypher Query Language, following this model, you would execute the following query:
START cia = node:nodes(name = "CIA")
MATCH cia-[gdp:HAS_COMPUTED_GDP]->(country)
WHERE gdp.year = 2012
RETURN cia, country, gdp
In this query, I used an index lookup as a starting point (rather than IDs which are a internal technical notion that shouldn't be used) to retrieve CIA by name and match the relevant subgraph to finally return CIA, the GDP relationships and their linked countries matching the input constraints.
Although Neo4J is totally schemaless, this does not mean you should necessarily have a totally flexible data model. Having a little structure will always help to make your queries or traversals easier to read.
If you're not familiar with Cypher Query Language (which is not the only way to read or write data into the graph), have a look at the excellent documentation of Neo4J (Cypher: http://docs.neo4j.org/chunked/stable/cypher-query-lang.html, complete: http://docs.neo4j.org/chunked/stable/index.html) and try some queries there: http://console.neo4j.org/!
And to answer your second question, if you wanna add another year of GDP computations, this will just boil down to adding new relationship "HAS_COMPUTED_GDP" between the organizations and the countries, no more no less.
Hope it helps :)

How to approximate processing time?

It's common to see messages like "Installation will take 10 min aprox." , etc in desktop applications. So, I wonder how can I calculate an approximate of how much time a certain process will take. Off course I won't install anything but I want to update some internal data and depending on the user usage this might take some time.
Is this possible in a iPhone app? How Cocoa guys do this, would it be the same way in iPhone apps?
Thanks in advance.
UPDATE: I want to rewrite/edit some files on disk, most of the time these files are not the same size so I cannot use timers for the first iteration and calculate the rest from that.
Is there any API that helps on calculating this?
If you have some list of things to process, each "thing" - usually better to measure a group of 10 or so "things" - is a unit of work. Your goal is to see how long it takes to process a single group and report the estimated time to completion.
One way is to create an NSDate at the start of each group and a new one at the end (the top and bottom of your for loop) for each group. Multiply the difference in seconds by however many groups you have left (minus the one you just processed) and that should be a reasonable estimate of the time remaining.
Of course this gets more complicated if one "thing" takes a lot longer to process than another "thing" - the above approach assumes all things take the same amount of time. In this case, however, you may need to keep track of an average window (across the last n "things" or groups thereof).
A more detailed response would require more details about your model and what work you're performing.