Getting Recursive Tasks in Asana with reasonable performance - rest

I'm using the Asana REST API to iterate over workspaces, projects, and tasks. After I achieved the initial crawl over the data, I was surprised to see that I only retrieved the top-level tasks. Since I am required to provide the workspace and project information, I was hoping not to have to recurse any deeper. It appears that I can recurse on a single task with the \subtasks endpoint and re-query... wash/rinse/repeat... but that amounts to a potentially massive number of REST calls (one for each subtask to see if they, in turn, have subtasks to query - and so on).
I can partially mitigate this by adding to the opt_fields query parameter something like:
&opt_fields=subtasks,subtasks.subtasks
However, this doesn't scale well. It means I have to elongate the query for each layer of depth. I suppose I could say "don't put tasks deeper than x layers deep" - but that seems to fly in the face of Asana's functionality and design. Also, since I need lots of other properties, it requires me to make a secondary query for each node in the hierarchy to gather those. Ugh.
I can use the path method to try to mitigate this a bit:
&opt_fields=(this|subtasks).(id|name|etc...)
but again, I have to do this for every layer of depth. That's impractical.
There's documentation about this great REPEATER + operator. Supposedly it would work like this:
&opt_fields=this.subtasks+.name
That is supposed to apply to ALL subtasks anywhere in the hierarchy. In practice, this is completely broken, and the REST API chokes and returns only the ids of the top-level tasks. :( Apparently their documentation is just wrong here.
The only method that seems remotely functional (if not practical) is to iterate first on the top-level tasks, being sure to include opt_fields=subtasks. Whenever this is a non-empty array, I would need to recurse on that task, query for its subtasks, and continue in that manner, until I reach a null subtasks array. This could be of arbitrary depth. In practice, the first REST call yields me (hopefully) the largest number of tasks, so the individual recursion may be mitigated by real data... but it's a heck of an assumption.
I also noticed that the limit parameter applied ONLY to the top-level tasks. If I choose to expand the subtasks, say. I could get a thousand tasks back instead of 100. The call could timeout if the data is too large. The safest thing to do would be to only request the ids of subtasks until recursion, and as always, ask for all the desired top-level properties at that time.
All of this seems incredibly wasteful - what I really want is a flat list of tasks which include the parent.id and possibly a list of subtasks.id - but I don't want to query for them hierarchically. I also want to page my queries with rational data sizes in mind. I'd like to get 100 tasks at a time until Asana runs out - but that doesn't seem possible, since the limit only applies to top-level items.
Unfortunately the repeater didn't solve my problem, since it just doesn't work. What are other people doing to solve this problem? And, secondarily, can anyone with intimate Asana insight provide any hope of getting a better way to query?
While I'm at it, a suggested way to design this: the task endpoint should not require workspace or project predicate. I should be able to filter by them, but not be required to. I am limited to 100 objects already, why force me to filter unnecessarily? In the same vein - navigating the hierarchy of Asana seems an unnecessary tax for clients who are not Asana (and possibly even the Asana UI itself).
Any ideas or insights out there?

Have you ensured that the + you send is URL-encoded? Whatever library you are using should usually handle this (which language are you using, btw? We have some first-party client libraries available)
Try &opt_fields=this.subtasks%2B.name if you're creating the URL manually, or (better yet) use a library that correctly encodes URL query parameters.

Related

REST design principles: Referencing related objects vs Nesting objects

My team and I we are refactoring a REST-API and I have come to a question.
For terms of brevity, let us assume that we have an SQL database with 4 tables: Teachers, Students, Courses and Classrooms.
Right now all the relations between the items are represented in the REST-API through referencing the URL of the related item. For example for a course we could have the following
{ "id":"Course1", "teacher": "http://server.com/teacher1", ... }
In addition, if ask a list of courses thought a call GET call to /courses, I get a list of references as shown below:
{
... //pagination details
"items": [
{"href": "http://server1.com/course1"},
{"href": "http://server1.com/course2"}...
]
}
All this is nice and clean but if I want a list of all the courses titles with the teachers' names and I have 2000 courses and 500 teachers I have to do the following:
Approximately 2500 queries just to read the data.
Implement the join between the teachers and courses
Optimize with caching etc, so that I will do it as fast as possible.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently.
Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
My question therefore is:
1. Is it wrong if we we nest the teacher information in the courses.
2. Should the listing of items e.g. GET /courses return a list of references or a list of items?
Edit: After some research I would say the model I have in mind corresponds mainly to the one shown in jsonapi.org. Is this a good approach?
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Your colleagues have lost the plot.
Here's your heuristic - how would you support this use case on a web site?
You would probably do it by defining a new web page, that produces the report you need. You'd run the query, you the result set to generate a bunch of HTML, and ta-da! The client has the information that they need in a standardized representation.
A REST-API is the same thing, with more emphasis on machine readability. Create a new document, with a schema so that your clients can understand the semantics of the document you return to them, tell the clients how to find the target uri for the document, and voila.
Creating new resources to handle new use cases is the normal approach to REST.
Yes, I totally think you should design something similar to jsonapi.org. As a rule of thumb, I would say "prefer a solution that requires less network calls". It's especially true if amount of network calls will be less by order of magnitude.
Of course it doesn't eliminate the need to limit the request/response size if it becomes unreasonable.
Real life solutions must have a proper balance. Clean API is nice as long as it works.
So in your case I would so something like:
GET /courses?include=teachers
Or
GET /courses?includeTeacher=true
Or
GET /courses?includeTeacher=brief|full
In the last one the response can have only the teacher's id for brief and full teacher details for full.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Have you actually measured the overhead generated by each request? If not, how do you know that the overhead will be too intense? From an object-oriented programmers perspective it may sound bad to perform each call on their own, your design, however, lacks one important asset which helped the Web to grew to its current size: caching.
Caching can occur on multiple levels. You can do it on the API level or the client might do something or an intermediary server might do it. Fielding even mad it a constraint of REST! So, if you want to comply to the REST architecture philosophy you should also support caching of responses. Caching helps to reduce the number of requests having to be calculated or even processed by a single server. With the help of stateless communication you might even introduce a multitude of servers that all perform calculations for billions of requests that act as one cohesive system to the client. An intermediary cache may further help to reduce the number of requests that actually reach the server significantly.
A URI as a whole (including any path, matrix or query parameters) is actually a key for a cache. Upon receiving a GET request, i.e., an application checks whether its current cache already contains a stored response for that URI and returns the stored response on behalf of the server directly to the client if the stored data is "fresh enough". If the stored data already exceeded the freshness threshold it will throw away the stored data and route the request to the next hop in line (might be the actual server, might be a further intermediary).
Spotting resources that are ideal for caching might not be easy at times, though the majority of data doesn't change that quickly to completely neglect caching at all. Thus, it should be, at least, of general interest to introduce caching, especially the more traffic your API produces.
While certain media-types such as HAL JSON, jsonapi, ... allow you to embed content gathered from related resources into the response, embedding content has some potential drawbacks such as:
Utilization of the cache might be low due to mixing data that changes quickly with data that is more static
Server might calculate data the client wont need
One server calculates the whole response
If related resources are only linked to instead of directly embedded, a client for sure has to fire off a further request to obtain that data, though it actually is more likely to get (partly) served by a cache which, as mentioned a couple times now throughout the post, reduces the workload on the server. Besides that, a positive side effect could be that you gain more insights into what the clients are actually interested in (if an intermediary cache is run by you i.e.).
Is it wrong if we we nest the teacher information in the courses.
It is not wrong, but it might not be ideal as explained above
Should the listing of items e.g. GET /courses return a list of references or a list of items?
It depends. There is no right or wrong.
As REST is just a generalization of the interaction model used in the Web, basically the same concepts apply to REST as well. Depending on the size of the "item" it might be beneficial to return a short summary of the items content and add a link to the item. Similar things are done in the Web as well. For a list of students enrolled in a course this might be the name and its matriculation number and the link further details of that student could be asked for accompanied by a link-relation name that give the actual link some semantical context which a client can use to decide whether invoking such URI makes sense or not.
Such link-relation names are either standardized by IANA, common approaches such as Dublin Core or schema.org or custom extensions as defined in RFC 8288 (Web Linking). For the above mentioned list of students enrolled in a course you could i.e. make use of the about relation name to hint a client that further information on the current item can be found by following the link. If you want to enable pagination the usage of first, next, prev and last can and probably should be used as well and so forth.
This is actually what HATEOAS is all about. Linking data together and giving them meaningful relation names to span a kind of semantic net between resources. By simply embedding things into a response such semantic graphs might be harder to build and maintain.
In the end it basically boils down to implementation choice whether you want to embed or reference resources. I hope, I could shed some light on the usefulness of caching and the benefits it could yield, especially on large-scale systems, as well as on the benefit of providing link-relation names for URIs, that enhance the semantical context of relations used within your API.

Multiple Dialogflow commands asked at same time

I have an Action where the user can set values of different parameters. Currently this is implemented something like this, and it works well:
Now I want to make the conversation less robot-like and more flexible, so I would like to allow users to set or change more than one value at a time. They should be able to say things like
Change the Interest Rate to 4% and the Term to 15 years.
or
Change the Interest Rate to 4%, the Term to 15 years, and the Years to Average Principal to 3.
There are a couple of ways to do this, but none of them are great, and all of them have issues of some sort when you try to scale them. (So they might work well for two or three parameters entered, but they probably won't work well for more than that.)
(It is worth noting, just for reference, that the Assistant itself has only recently started accepting more than one instruction at a time. But it only handles two, and this doesn't work for all commands.)
Add phrases with additional parameters
With this solution, you would supplement the phrases you have that collect one parameter with a similar set of phrases that collect two parameters. And then another set that also collect three parameters. You should be able to do these all as a single Intent and, in your fulfillment, determine which ones have been set.
It might look something like this:
That looks like it starts getting complicated, doesn't it? You need to list each combination of absolute values and percentages. If you have other types, you need to include each of those combinations as well. That starts getting unwieldy for 3 possible parameters, and certainly is above that. You also run the risk that it might get confused about which parameter should be set with which value (I haven't tested this - it is a theoretical concern).
Add an optional continuation phrase and handle that recursively
You can also treat this as the user saying "set a value, and then do something else" and treat the "do something else" part as another statement made to Dialogflow. The Intent might look something like this:
You can implement the "another statement made to Dialogflow" using the Dialogflow API. With Dialogflow V1, you'd use the Query endpoint. With Dialogflow V2, you'd use the detectIntent endpoint. In either case, you'd send the additional part of the query (if the user said something) and would get back the results from that. You'd add the resulting message from the call to the message from setting the current set of values and send the whole thing back.
As a recursive call, however, this does take up time. Since the initial call to Dialogflow really needs to be answered within 5 seconds, every additional call to Dialogflow (and then to your fulfillment) needs to be handled as quickly as possible. But even so, you probably won't be able to handle more than 2 or 3 of these before things time out on the front end.
It also runs the risk (or benefit) that other intents besides the edit.attribute Intent might be called in the "additional" portion. If you want to limit the risk of this, you could set a context to make sure that only Intents that have that incoming context would be called.
Summary
This really isn't an easy problem to solve. On one hand, you have the problem of having to list out every combination. On the other hand, recursion takes time, and you don't have a lot of time to process everything. In both cases, there is a real possibility of the phrase being understood incorrectly and you'll need to figure out error handling in the case where some values have been changed and others haven't.
You may need to experiment a lot, and the results may still not be satisfactory.
You can implement the "another statement made to Dialogflow" using the
Dialogflow API. With Dialogflow V1, you'd use the Query endpoint.
With Dialogflow V2, you'd use the detectIntent endpoint. In either
case, you'd send the additional part of the query (if the user said
something) and would get back the results from that. You'd add the
resulting message from the call to the message from setting the
current set of values and send the whole thing back.
As a recursive call, however, this does take up time. Since the
initial call to Dialogflow really needs to be answered within 5
seconds, every additional call to Dialogflow (and then to your
fulfillment) needs to be handled as quickly as possible. But even so,
you probably won't be able to handle more than 2 or 3 of these before
things time out on the front end.
The first thing that came to mind after reading those two paragraphs was batch requests.
A batch request allows a client application to pack multiple API calls into a single HTTP request (this batching technique is also known as a multi-part request).
Many Google APIs support a batch endpoint and I was able to verify that DialogFlow has a batch endpoint by checking its API Discovery document. This batch endpoint is not formerly documented in DialogFlow's API reference but you can leverage the documentation of other APIs (like this one) to get a feel for how it works. This link should also be instructive now that the global batch endpoint is no longer supported.
Assuming your queries are independent (ie. they don't rely on the results of other queries) you should be able to use a batch request to fetch more data.

How to design a query where I retrieve last data from resource that I want to apply filter to in RESTful way?

How should a query look like when I want to retrieve last measurements from installations that aren't removed?
Something like that?
/my-web-service/installations/measurements/last?removed=false
The thing is, I don't want to retrieve last measurements that weren't removed from installations. I want to retrieve last measurements from installations that weren't removed.
I see a couple possibilities here:
If you need to read the data from the endpoint transactionally, the way you designed it is the way to go. What I'd change is the name of the param from removed to installationRemoved since it's more descriptive and shorten the endpoint to /my-web-service/measurements/ - since with installations it's unclear in which scope does the client operate. Also, don't you need since param to filter the last measurements?
It there's a chance to split the two endpoints I'd add:
/my-web-service/installations/?removed=false
/my-web-service/measurements/?since=timestamp&installations=<array>
It does not make it better (when it comes to better or worse) but easier and more predictive for the users.
In general try to add more general endpoints with filtering options rather then highly dedicated - doing one particular thing. This way leads to hard to use, loose API. Also, on filtering.
And final notice, your API is good if your clients use it not because they have to but when they like it ;)
According to this best practices article, you could use "aliases for common queries":
To make the API experience more pleasant for the average consumer,
consider packaging up sets of conditions into easily accessible
RESTful paths. For example, the recently closed tickets query above
could be packaged up as GET /tickets/recently_closed
So, in your case, it could be:
/my-web-service/installations/non_removed/measurements/last
where non_removed would be an alias for querying installations that weren't removed.
Hope it helps!

Conducting searches with REST that return large datasets?

I'm creating a RESTful WebAPI for our system in .Net, when conducting a search in my client I presume that it should hit the /person route passing parameters when required to filter the data. However, the person object that is return has quite a lot of nested objects which could slow down data retrieval. Should I have a separate controller which returns a more skeletonised view of a person, should I continue the way I am going, or should I be making subsequent requests to break down the person?
Actually, there is no silver-bullet way to solve your problem, but there are several approaches, which could be usefull for you. However, in my opinion, your idea about optimizing the size of resource representation in search results is correct.
You can include the list of requested fields in filtering query. (for example, see the similar signature/approach in ES search API). Many search engines are following this approach to reduce redundant response payload.
As you have metioned, you can break your heavy object in sub-resources, so that you would be able to include only links to nested objects inside the person, without including the whole represantations of inner-objects. The HATEOAS approach will fit perfectly for this purpose, but it will add extra complexity to your application (but the extra flexibility too).
However, you have to choose, which approach is better for your particular application, but I think, that a good starting point will be the approach with list of requested fields.

How to manage a pool via a RESTful interface

As I am not sure I stated the question very well originally, I am restating it to see if there is a better response.
I have a problem with how best to manage a specific kind collection with a RESTful API. To help illustrate the issue I have I will use an simple artificial example. Lets call it the 'Raffle Ticket Selector'. For this question I am only interested in how to perform one function.
I have a collection of unpurchased raffle tickets (raffleTickets). Each with a unique Raffle Number along with other information.
I need to be able to take an identified number of tickets (numTickets) from the raffleTickets collection without uniquely selecting them. The collection itself has a mechanism for random selection.
The result is that I am returned 5 unique tickets from the collection and the size of the collection is decreased by 5 as the 5 returned have been removed.
The quesition is, how do I do it in a RESTfull way?
I intuatively want to do METHOD .../raffelTickets?numTickets=5 but struggle with which HTTP Method to use
In answering; you are not allowed to suggest that I just PATCH/PUT a status change to effect a removal by marking them taken. It must result an actual change in the cardanality of the collection.
Note: Calling the method twice will return a different result set every time and will always alter the collection on which it is performed (unless it is empty!)
So what method should I use? PUT? POST? DELETE? PATCH? Identpotent restrictions would seem to only leave me with POST and PATCH neither of which feels ideal to me. Or perhaps there is another way of providing the overall behavior that is considered the correct approach.
I am really interested to know what is best practice and understand why.
Cheers
Original Post on which the first response was based:
I have a pool of a given item which is to be managed with a RESTful API. Now adding items to the pool is not an issue but how to I take items from the pool? Is it also a POST or is it a DELETE?
Lets say it is a pool of random numbers and I want to retrieve a variable number of items in a single method call.
I have two scenarios:
I am not checking them out as once taken they will not be returned to the pool.
I only want to check them out and they effectively remain part of the pool but have a status altered to 'inUse'
The important thing in each case is I do not care which items I get, I just want N of them.
What is considered the RESTful way performing each of the two actions on the pool? I have an opinion on the second option but I dither on the former so I am interested in your thoughts for both so I better understand the thought pattern
Thanks
Not sure if I understood well your question. It will mostly depend on the way you developed the API side of your REST communication.
In a generic solution, you would use DELETE to take items out of a list. However, if you just want to PARTIALY update the items, you could use PATCH instead of POST or PUT.
Give this a look: http://restcookbook.com/HTTP%20Methods/patch/