REST API retrieving many subresources efficiently - rest

Let's assume I have a REST API for a bulletin board with threads and their comments as a subresource, e.g.
/threads
/threads/{threadId}/comments
/threads/{threadId}/comments/{commentId}
The user can retrieve all threads with /threads, but what is an efficient/good way to retrieve all comments?
I know that HAL can embeded subresources directly into a parent resource, but that possibly means sending much data over the network, even if the client does not need the subresource. Also, I guess paging is difficult to implement (let's say one thread contains many hundred posts).
Should there be a different endpoint representing the SQL query where threadId in (..., ..., ...)? I'm having a hard time to name this endpoint in the strict resource oriented fashion.
Or should I just let the client retrieve each subresource individually? I guess this boils down to the N+1 problem. But maybe it's not so much of a deal, as they client could start to retrieve all subresources at once, and the responses should come back simulataneously? I could think of the drawback that this more or less forces the API client to use non-blocking IO (as otherwise the client may need to open 20 threads for a page size of 20 - or even more), which might not be so straight-forward in some frameworks. Also, with HTTP 1.1, only 6 simulatenous requests are allowed per TCP connection, right?
I actually now tend to the last option, with a focus on HTTP 2 and non-blocking IO (or even server push?) - although some more simpler clients may not support this. At least the API would be clean and does not have to be changed just to work around technical difficulties.
Is there any other option I have missed?

Related

Single endpoint instead of API - what are the disadvantages?

I have a service, which is exposed over HTTP. Most of traffic input gets into it via single HTTP GET endpoint, in which the payload is serialized and encrypted (RSA). The client system have common code, which ensures that the serialization and deserialization will succeed. One of the encoded parameters is the operation type, in my service there is a huge switch (almost 100 cases) that checks which operation is performed and executes the proper code.
case OPERATION_1: {
operation = new Operation1Class(basicRequestData, serviceInjected);
break;
}
case OPERATION_2: {
operation = new Operation2Class(basicRequestData, anotherServiceInjected);
break;
}
The endpoints have a few types, some of them are typical resource endpoints (GET_something, UPDATE_something), some of them are method based (VALIDATE_something, CHECK_something).
I am thinking about refactoring the API of the service so that it is more RESTful, especially in the resource-based part of the system. To do so I would probably split the endpoint into the proper endpoints (e.g. /resource/{id}/subresource) or RPC-like endpoints (/validateSomething). I feel it would be better, however I cannot make up any argument for this.
The question is: what are the advantages of the refactored solution, and what follows: what are the disadvantages of the current solution?
The current solution separates client from server, it's scalable (adding new endpoint requires adding new operation type in the common code) and quite clear, two clients use it in two different programming languages. I know that the API is marked as 0-maturity in the Richardson's model, however I cannot make up a reason why I should change it into level 3 (or at least level 2 - resources and methods).
Most of traffic input gets into it via single HTTP GET endpoint, in which the payload is serialized and encrypted (RSA)
This is potentially a problem here, because the HTTP specification is quite clear that GET requests with a payload are out of bounds.
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
It's probably worth taking some time to review this, because it seems that your existing implementation works, so what's the problem?
The problem here is interop - can processes controlled by other people communicate successfully with the processes that you control? The HTTP standard gives us shared semantics for our "self descriptive messages"; when you violate that standard, you lose interop with things that you don't directly control.
And that in turn means that you can't freely leverage the wide array of solutions that we already have in support of HTTP, because you've introduce this inconsistency in your case.
The appropriate HTTP method to use for what you are currently doing? POST
REST (aka Richardson Level 3) is the architectural style of the world wide web.
Your "everything is a message to a single resource" approach gives up many of the advantages that made the world wide web catastrophically successful.
The most obvious of these is caching. "Web scale" is possible in part because the standardized caching support greatly reduces the number of round trips we need to make. However, the grain of caching in HTTP is the resource -- everything keys off of the target-uri of a request. Thus, by having all information shared via a single target-uri, you lose fine grain caching control.
You also lose safe request semantics - with every message buried in a single method type, general purpose components can't distinguish between "effectively read only" messages and messages that request that the origin server modify its own resources. This in turn means that you lose pre-fetching, and automatic retry of safe requests when the network is unstable.
In all, you've taken a rather intelligent application protocol and crippled it, leaving yourself with a transport protocol.
That's not necessarily the wrong choice for your circumstances - SOAP is a thing, after all, and again, your service does seem to be working as is? which implies that you don't currently need the capabilities that you've given up.
It would make me a little bit suspicious, in the sense that if you don't need these things, why are you using HTTP rather than some messaging protocol?

Rest API vs GraphQL - Idempotent

This question/answer from another user was very informative as to what idempotent means:
What is an idempotent operation?
When it comes to Rest API, since caching for GET requests can quickly be enabled if not already, if a user is wanting to fetch some examples: /users/:id or /posts/:id, they can do so as many times as they'd like and it shouldn't mutate any data.
If I'm understanding correctly, a GET request is idempotent in this case.
QUESTION
I believe Relay and Dataloader can help with GraphQL queries as far as caching, but doesn't address browser/mobile caching.
If we're talking about just the GET request portion of GraphQL, it's part of a single endpoint, what could I use tech/features or otherwise, that would address the benefits that regular http requests provide caching-wise.
Idempotence vs Caching
First of all, caching and idempotence are different things and do not necessarily relate to one another. Caching may or may not be used to implement idempotent operations - it is certainly not a requirement.
Secondly, when speaking of HTTP requests, idempotence essentially concerns the state of the server rather than its responses. An idempotent operation will have leave the server in the exact same state if performed multiple times. It does not mean that the responses returned by an idempotent operation will be the same (although they often might be).
GET requests in REST are expected to be idempotent by contract - i.e. a GET operation must not have any state-altering side-effects on the server (technically this may not always be true, but for the sake of the explanation let's assume it is). This does not mean that if you GET the same resource multiple times, you'll always get the same data back. Actually, that wouldn't make sense since resources change over time and may be deleted as well right? So caching GET responses can help with performance but has nothing to do with idempotence.
Another way to look at it is that you are querying for a resource rather than arbitrary data. A GET request is idempotent in that you'll always get the same resource and you won't modify the state of the system in any way.
Finally, a poorly (oddly?) developed GET operation on the server side might have side-effects, which would violate the REST contract and make the operation non idempotent. Simple response caching would not help in such a case.
GraphQL
I like to see GraphQL queries as equivalent to GETs in REST. This means that if you query for data in GraphQL, the resolvers must not perform any side-effects. This would ensure that performing the same query multiple times will leave the server in an unchanged state.
Just like in a simple GET you'd be querying for specific resources, although unlike in GET, GraphQL allows you to query for many instances of many different types of resources at once. Once again, that does not mean identical responses, first and foremost because the resources may change over time.
If some of your queries have side-effects (i.e. they alter the state of the resources on the server), they are not idempotent! You should probably use mutations instead of queries to achieve these side-effects. Using mutations would make it clear to the client/consumer that the operation is not idempotent and should be treated accordingly (mutation inputs may accept idempotence keys to ensure Stripe-like idempotency, but that's a separate topic).
Caching GraphQL responses
I hope that by now it's clear that caching is not required to ensure / is not what determines the idempotency of GraphQL queries. It's only used to improve performance.
If you are still interested in server-side caching options for GraphQL, there are plenty of resources. You could start by reading what Apollo Server documentation has to say on the topic. Don't forget that you can also cache database/service/etc. responses. I am not going provide any specific suggestions since, judging by your question, there's much grater confusion elsewhere.

Rest Security Ensure Resource Delete

Background:I'm a new developer fresh out of college at a company that uses RPC architectural style for a lot its internal services.They also seem to change which tool they use behind the scenes pretty frequently, so the tight coupling between the client and server implementations in RPC is problematic. I was tasked with rewriting one of the services, and I feel a RESTful api would be a good match because the backing technology can only deal with files anyway, but I have a few questions.My understanding of REST so far is that you break operations up as much as possible and shift the focus to resources, so both the client and the server together make a state machine with the server mainly handling the transitions through hypermedia.Example:say you have a service that takes a file and splits it in two byte-wise.I would design the sequence for this likethe client would POST the file they want split,server splits the fileserver writes both result pieces to a temp folderserver returns that the client should GET and both files URI'sthe client sends a GET for the pieceserver returns the piece and that the client should DELETE the URIthe client sends a DELETE for the URI
and 2 and 3 are done for both pieces.My question is: How do you ensure that the pieces get deleted at the end?a client could just not follow step 3if you combine step 2&3, a malicious (or negligent) client could just stop after step 1but if you combine them all, isn't that just RPC over HTTP?
If the 2 pieces in question are inseparable, then they are in fact just properties of a single resource.
And yes, if a POST/PUT must be followed by a DELETE, then you're probably just trying to shoehorn RPC into a REST-style architecture.
There's no real definition of what "REST" actually is, but if the one thing certain about it is that it MUST be stateless; i.e. every separate request must be self-sufficient - it cannot depend on a previous request, and cannot mandate subsequent requests.

How to pass a large number of input parameters to RESTful service?

I have a RESTful service that returns detailed data about a machine by the supplied list of Ids. GET api/machine/
http://service.com/api/machine/1,2,3,4
Up till now this has been fine since I am getting a small number of machines at a time, but now I need to get all machines (more then 1000). This exceeds the 2000 character limit on URLs.
I have gotten both of the options below to work and I'm looking for some community feedback on which way to go.
Option 1: Split up my GET. Make multiple calls with a subset of the ids. Pros: I am doing a get so using the HTTP verb GET makes sense. Cons: If a person new to the service doesn't know about this limit, or doesn't use my client, it would cause problems.
Option 2: Add a PUT/POST method and include the full list of ids in the body. Pros: Makes 1 call to get all data. Cons: I am now doing a get from a PUT/POST.
Probably your best course-of-action would be something in the lines of option 2, you can create a JSON on your side with an array of the numbers you want to send in the Body of the message. If there's the possibility of it still being far too large, you can split it in several messages, when you receive the response of one you'd send the next item in the queue, and so on.
Another option, used by the Facebook API among others, is to create a "/batch" POST method which can be used to make multiple requests in one go.
So instead of having http://service.com/api/machine/1,2,3,4,5,.... you'll have a batch of requests with /machine/1, /machine/2, /machine/3, etc.
The advantage is that you keep clean RESTful URLs (no more coma-separated values) and it scales very well since you can batch as many requests as you want.
The disadvantage is that it is slightly more complex to build.
See there for more information - https://developers.facebook.com/docs/graph-api/making-multiple-requests

What to do about huge resources in REST API

I am bolting a REST interface on to an existing application and I'm curious about what the most appropriate solution is to deal with resources that would return an exorbitant amount of data if they were to be retrieved.
The application is an existing timesheet system and one of the resources is a set of a user's "Time Slots".
An example URI for these resources is:
/users/44/timeslots/
I have read a lot of questions that relate to how to provide the filtering for this resource to retrieve a subset and I already have a solution for that.
I want to know how (or if) I should deal with the situation that issuing a GET on the URI above would return megabytes of data from tens or hundreds of thousands of rows and would take a fair amount of server resource to actually respond in the first place.
Is there an HTTP response that is used by convention in these situations?
I found HTTP code 413 which relates to a Request entity that is too large, but not one that would be appropriate for when the Response entity would be too large
Is there an alternative convention for limiting the response or telling the client that this is a silly request?
Should I simply let the server comply with this massive request?
EDIT: To be clear, I have filtering and splitting of the resource implemented and have considered pagination on other large collection resources. I want to respond appropriately to requests which don't make sense (and have obviously been requested by a client constructing a URI).
You are free to design your URIs as you want encoding any concept.
So, depending on your users (humans/machines) you can use that as a split on a conceptual level based on your problem space or domain. As you mentioned you probably have something like this:
/users/44/timeslots/afternoon
/users/44/timeslots/offshift
/users/44/timeslots/hours/1
/users/44/timeslots/hours/1
/users/44/timeslots/UTC1624
Once can also limit by the ideas/concepts as above. You filter more by adding queries /users/44/timeslots?day=weekdays&dow=mon
Making use or concept and filters like this will naturally limit the response size. But you need to try design your API not go get into that situation. If your client misbehaves, give it a 400 Bad Request. If something goes wrong on your server side use a 5XX code.
Make use of one of the tools of REST - hypermedia and links (See also HATEOAS) Link to the next part of your hypermedia, make use of "chunk like concepts" that your domain understands (pages, time-slots). No need to download megabytes which also not good for caching which impacts scalability/speed.
timeslots is a collection resource, why won't you simply enable pagination on that resource
see here: Pagination in a REST web application
calling get on the collection without page information simply returns the first page (with a default page size)
Should I simply let the server comply with this massive request?
I think you shouldn't, but that's up to you to decide, can the server handle big volumes? do you find it a valid usecase?
This may be too weak of an answer but here is how my team has handled it. Large resources like that are Required to have the additional filtering information provided. If the filtering information is not there to keep the size within a specific range then we return an Internal Error (500) with an appropriate message to denote that it was a failure to use the RESTful API properly.
Hope this helps.
You can use a custom Range header - see http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html
Or you can (as others have suggested) split your resource up into smaller resources at different URLs (representing sections, or pages, or otherwise filtered versions of the original resource).