Rest API vs GraphQL - Idempotent - rest

This question/answer from another user was very informative as to what idempotent means:
What is an idempotent operation?
When it comes to Rest API, since caching for GET requests can quickly be enabled if not already, if a user is wanting to fetch some examples: /users/:id or /posts/:id, they can do so as many times as they'd like and it shouldn't mutate any data.
If I'm understanding correctly, a GET request is idempotent in this case.
QUESTION
I believe Relay and Dataloader can help with GraphQL queries as far as caching, but doesn't address browser/mobile caching.
If we're talking about just the GET request portion of GraphQL, it's part of a single endpoint, what could I use tech/features or otherwise, that would address the benefits that regular http requests provide caching-wise.

Idempotence vs Caching
First of all, caching and idempotence are different things and do not necessarily relate to one another. Caching may or may not be used to implement idempotent operations - it is certainly not a requirement.
Secondly, when speaking of HTTP requests, idempotence essentially concerns the state of the server rather than its responses. An idempotent operation will have leave the server in the exact same state if performed multiple times. It does not mean that the responses returned by an idempotent operation will be the same (although they often might be).
GET requests in REST are expected to be idempotent by contract - i.e. a GET operation must not have any state-altering side-effects on the server (technically this may not always be true, but for the sake of the explanation let's assume it is). This does not mean that if you GET the same resource multiple times, you'll always get the same data back. Actually, that wouldn't make sense since resources change over time and may be deleted as well right? So caching GET responses can help with performance but has nothing to do with idempotence.
Another way to look at it is that you are querying for a resource rather than arbitrary data. A GET request is idempotent in that you'll always get the same resource and you won't modify the state of the system in any way.
Finally, a poorly (oddly?) developed GET operation on the server side might have side-effects, which would violate the REST contract and make the operation non idempotent. Simple response caching would not help in such a case.
GraphQL
I like to see GraphQL queries as equivalent to GETs in REST. This means that if you query for data in GraphQL, the resolvers must not perform any side-effects. This would ensure that performing the same query multiple times will leave the server in an unchanged state.
Just like in a simple GET you'd be querying for specific resources, although unlike in GET, GraphQL allows you to query for many instances of many different types of resources at once. Once again, that does not mean identical responses, first and foremost because the resources may change over time.
If some of your queries have side-effects (i.e. they alter the state of the resources on the server), they are not idempotent! You should probably use mutations instead of queries to achieve these side-effects. Using mutations would make it clear to the client/consumer that the operation is not idempotent and should be treated accordingly (mutation inputs may accept idempotence keys to ensure Stripe-like idempotency, but that's a separate topic).
Caching GraphQL responses
I hope that by now it's clear that caching is not required to ensure / is not what determines the idempotency of GraphQL queries. It's only used to improve performance.
If you are still interested in server-side caching options for GraphQL, there are plenty of resources. You could start by reading what Apollo Server documentation has to say on the topic. Don't forget that you can also cache database/service/etc. responses. I am not going provide any specific suggestions since, judging by your question, there's much grater confusion elsewhere.

Related

REST API retrieving many subresources efficiently

Let's assume I have a REST API for a bulletin board with threads and their comments as a subresource, e.g.
/threads
/threads/{threadId}/comments
/threads/{threadId}/comments/{commentId}
The user can retrieve all threads with /threads, but what is an efficient/good way to retrieve all comments?
I know that HAL can embeded subresources directly into a parent resource, but that possibly means sending much data over the network, even if the client does not need the subresource. Also, I guess paging is difficult to implement (let's say one thread contains many hundred posts).
Should there be a different endpoint representing the SQL query where threadId in (..., ..., ...)? I'm having a hard time to name this endpoint in the strict resource oriented fashion.
Or should I just let the client retrieve each subresource individually? I guess this boils down to the N+1 problem. But maybe it's not so much of a deal, as they client could start to retrieve all subresources at once, and the responses should come back simulataneously? I could think of the drawback that this more or less forces the API client to use non-blocking IO (as otherwise the client may need to open 20 threads for a page size of 20 - or even more), which might not be so straight-forward in some frameworks. Also, with HTTP 1.1, only 6 simulatenous requests are allowed per TCP connection, right?
I actually now tend to the last option, with a focus on HTTP 2 and non-blocking IO (or even server push?) - although some more simpler clients may not support this. At least the API would be clean and does not have to be changed just to work around technical difficulties.
Is there any other option I have missed?

Should a PUT request be compatible with a GET response from the same resource

I'm reviewing a number of endpoints that have been developed against the beginnings of what will be a complex API.
Are there any industry recommended best practises that say that you should ensure that a response of a GET request can be used as the payload for a PUT request against the same resource.
Additionally, I would appreciate feedback of potential pitfalls people have had taking this approach.
I would say 'it helps', but it's not strictly necessary. Because content-negotiation and multiple representations are a thing, you could use a different media type for 'changing' vs 'retrieving' the state of a resource.
If the representation of your request is a simple subset, maybe PATCH is what you're looking for.
If the response of a PUT request contains an ETag it is expected that the request body of the PUT perfectly matches byte-for-byte the current state. In reality this is rarely the case though, because API's tend to re-serialize their model and will often have subtle differences.
Having that ETag there is helpful though, because a smarter client can read that ETag and use that to warm their current cached state of the resource, and also avoids a race condition if more PUT's are needed in the future and the client wants to use ETags to avoid the lost update problem.
For me personally, I like the ability to GET a resource, change a single property PUT the complete resource again.

How to handle network connectivity loss in the middle of REST POST request?

REST POST is used to create resources.
Let's say we have resource url
"http://example.com/cars"
We want to create a new car.
We POST to "http://example.com/cars" with JSON payload containing car properties (color, weight, model, etc).
Server receives the request, creates a new car, sends a response over the network.
At this point network fails (let's say router stops working properly and ignores every packet).
Client fails with TCP timeout (like 90 seconds).
Client has no idea whether car was created or not.
Also client haven't received car resource id, so it can't GET it to check if it was created.
Now what?
How do you handle this?
You can't simply retry creating, because retrying will just create a duplicate (which is bad).
REST POST is used to create resources.
HTTP POST is used for lots of things. REST doesn't particularly care; it just wants resources that support a uniform interface, and hypermedia.
At this point network fails
Bummer!
Now what? How do you handle this? You can't simply retry creating, because retrying will just create a duplicate (which is bad).
This is a general messaging concern, not directly related to REST. The most common solution is to use the Idempotent Receiver pattern. In short, you
need to define your messages so that the receiver has enough information to recognize the request as something that has already been done.
Ideally, this is being supported at the business level.
Idempotent collections of values are often straight forward; we just need to be thinking sets, rather than lists.
Idempotent collections of entities are trickier; if the request includes an identifier for the new entity, or if we can compute one from the data provided, then we can think of our collection as a hash.
If none of those approaches fits, then there's another possibility. Instead of performing an idempotent mutation of the collection, we make the mutation of the collection itself idempotent. Think "compare and swap" - we encode into the request information that identifies the current state of the collection; is that state is still current when the request arrives, then the mutation is applied. If the condition does not hold, then the request becomes a no-op.
Translating this into HTTP, we make a small modification to the protocol for updating the collection resource. First, we GET the current representation; and in the meta data the server provides validators that can be used in subsquent requests. Having obtained the validator, the client evaluates the current representation of the resource to determine if it needs to be changed. If the client decides to make a change, then submits the change with an If-Match or an If-Unmodified-Since header including the validator. The server, before processing the requests, then considers the validator, immediately abandoning the request with 412 Precondition Failed.
Thus, if a conditional state-changing request is lost, the client can at its own discretion repeat the request without concern that server will misunderstand the client's intent.
Retry it a limited number of times, with increasing delays between the attempts, and make sure the transaction concerned is idempotent.
because retrying will just create a duplicate (which is bad).
It is indeed, and it needs fixing, see above. It should be impossible in your system to create two entries with the same attributes. This is easily accomplished at the database level. You can attain idempotence by having the transaction return the same thing whether the entry already existed or was newly created. Or else just have it return EXISTS if the entry already exists, and adjust your client accordingly.

Why use two step approach to deleting multiple items with REST

We all know the 'standard' way of deleting a single item via REST is to send a single DELETE request to a URI example.com/Items/666. Grand, let's move on to deleting many at once. As we do not require atomic deleting (or true transaction, ie all or nothing) we could just tell the client 'tough luck, make many requests' but that's not very nice is it. So we need a way to allow a client to request many 'Items' be deleted at once.
From my understanding, the 'typical' solution to this problem is a 'two step' approach. First the client POSTs a list of item IDs and is returned a URI such as example.com/Items/Collection/1. Once that collection is created, they call DELETE on it.
Now, I see that this works just fine, except to me, it is a bad solution. Firstly, you are forcing the client to make two requests to accommodate the server. Secondly, 'I thought DELETE was supposed to delete an Item?', shouldn't calling DELETE on this URI effectively cancel the transaction (it's not a true transaction though), how would we even cancel it? Really would be better if there was some form 'EXECUTE' action, but I can't rock the boat that much. It also forces the server to have to consider 'the JSON that was POSTed looks more like a request to modify these Items, but the request was DELETE... so I guess I will delete them'. This approach also starts to impose a sort of state on the client/server, not a true state I will admit, but it is sort of.
In my opinion, a better solution would be to simply call DELETE on example.com/Items (or maybe example.com/Items/Collection to imply this is a multiple delete) and pass JSON data containing a list of IDs that you wish to delete. As far as I can see, this basically solves all the problems the first method had. It is easier to use as a client, reduces the work the server has to do, is truly stateless, is more semantic.
I would really appreciate the feed back on this, am I missing something about REST that makes my solution to this problem unrealistic? I would also appreciate links to articles, especially if they compare these two methods; I am aware this is not normally approved of for SO. I need to be able to disprove that only the first method is truly RESTfull, prove that the second approach is a viable solution. Of course, if I am barking up the wrong tree do tell me.
I have spent the last week or so reading a fair bit on REST, and to the best of my understanding, it would be wrong to describe either of these solutions as 'RESTfull', rather you should say that 'neither solution goes against what REST means'.
The short answer is simply that REST, as laid out in Roy Fielding's dissertation (See chapter 5), does not cover the topic of how to go about deleting resources, singular or multiple, in a REST manor. That's right, there is no 'correct RESTful way to delete a resource'... well, not quite.
REST itself does not define how delete a resource, but it does define that what ever protocol you are using (remember that REST is not a protocol) will dictate the how perform these actions. The protocol will usually be HTTP; 'usually' being the key word as Fielding will point out, REST is not synonymous with HTTP.
So we look to HTTP to say which method is 'right'. Sadly, as far as HTTP is concerned, both approaches are viable. Yes 'viable'. HTTP will allow a client to send a POST request with a payload (to create a collection resource), and then call a DELETE method on this new collection to delete the resources; it will also allow you to send the data within the payload of a single DELETE method to delete the list of resources. HTTP is simply the medium by which you send requests to the server, it would be up to the server to respond appropriately. To me, the HTTP protocol seems to be rather open to interpretation in places, but it does seem to lay down fairly clear guide lines for what actions mean, how they should be dealt with and what response should be given; it's just it is a 'you should do this' rather than 'you must do this', but perhaps I am being a little pedantic on the wording.
Some people would argue that the 'two stage' approach cannot possibly be 'REST' as the server has to store a 'state' for the client to perform the second action. This is simply a misunderstanding of some part. It must be understood that neither the client nor the server is storing any 'state' information about the other between the list being POSTed and then subsequently being DELETEd. Yes, the list must have been created before it can deleted, but the server does not remember that it was client alpha that made this list (such an approach would allow the client to simply call 'DELETE' as the next request and the server remembers to use that list, this would not be stateless at all) as such, the client must tell the server to DELETE that specific list, the list it was given a specific URI for. If the client attempted to DELETE a collection list that did not already exist it would simply be told 'the resource can not be found' (the classic 404 error most likely). If you wish to claim that this two step approach does maintain a state, you must also claim that to simply GET an URI requires a state, as the URI must first exist. To claim that there is this 'state' persisting is misunderstanding what 'state' means. And as further 'proof' that such a two stage approach is indeed stateless, you could quite happily have client alpha POST the list and later client beta (without having had any communication with the other client) call DELETE on the list resources.
I think it can stand rather self evident that the second option, of just sending the list in the payload of the DELETE request, is stateless. All the information required to complete the request is stored completely within the one request.
It could be argued though that the DELETE action should only be called on a 'tangible' resource, but in doing so you are blatantly ignoring the REpresentational part of REST; It's in the name! It is the representational aspect that 'permits' URIs such as http://example.com/myService/timeNow, a URI that when 'got' will return, dynamically, the current time, with out having to load some file or read from some database. It is a key concept that the URIs are not mapping directly to some 'tangible' piece of data.
There is however one aspect of that stateless nature that must be questioned. As Fielding describes the 'client-stateless-server' in section 5.1.3, he states:
We next add a constraint to the client-server interaction: communication must
be stateless in nature, as in the client-stateless-server (CSS) style of
Section 3.4.3 (Figure 5-3), such that each request from client to server must
contain all of the information necessary to understand the request, and
cannot take advantage of any stored context on the server. Session state is
therefore kept entirely on the client.
The key part here in my eyes is "cannot take advantage of any stored context on the server". Now I will grant you that 'context' is somewhat open for interpretation. But I find it hard to see how you could consider storing a list (either in memory or on disk) that will be used to give actual useful meaning would not violate this 'rule'. With out this 'list context' the DELETE operation makes no sense. As such, I can only conclude that making use of a two step approach to perform an action such as deleting multiple resources cannot and should not be considered 'RESTfull'.
I also begrudge somewhat the effort that has had to be put into finding arguments either way for this. The Internet at large seems to have become swept up with this idea the the two step approach is the 'RESTfull' way doing such actions, with the reasoning 'it is the RESTfull way to do it'. If you step back for a moment from what everybody else is doing, you will see that either approach requires sending the same list, so it can be ignored from the argument. Both approaches are 'representational' and 'stateless'. The only real difference is that for some reason one approach has decided to require two requests. These two requests then come with follow up questions, such as how 'long do you keep that data for' and 'how does a client tell a server that it no longer wants that this collection, but wishes to keep the actual resources it refers to'.
So I am, to a point, answering my question with the same question, 'Why would you even consider a two step approach?'
IMO:
HTTP DELETE on existing collection to delete all of its member seems fine. Creating the collection just to delete all of the member sounds odd. As you yourself suggest, just pass IDs of the to be deleted items using JSON (or any other payload format). I think that the server should try to make multiple deletes an internal transaction though.
I would argue that HTTP already provides a method of deleting multiple items in the form of persistent connections and pipelining. At the HTTP protocol level it is absolutely fine to request idempotent methods like DELETE in a pipelined way - that is, send all the DELETE requests at once on a single connection and wait for all the responses.
This may be problematic for an AJAX client running in a browser since few browsers have pipelining support enabled by default. This is not the fault of HTTP, though, it is the fault of those specific clients.

Move resource in RESTful architecture

I have a RESTful web service which represent processes and activities. Each activity is inside one and only one process.
I would like to represent a "move" operation of activity between the process it is currently in and another process.
I've look at forums and found people suggest to use MOVE operation which is not very standard and other suggest to use PUT but then I'm not sure how to tell the difference between PUT that update and PUT that moves which looks semantically wrong.
Any ideas?
One way might be to represent the move itself as, say, a "transfer" resource (transfer as a noun), and POST a new one:
POST /transfer
With an entity containing:
activity: /activities/4
toProcess: /processes/13
This way, clients are creating new "transfers" which, on the server, handle validating and transferring the activity.
This gives you the ability to add information about the transfer, too. If you wanted to keep a history for auditing, you could add a transferredBy property to the resource, or a transferredOn date.
If using PUTs, you can tell the difference by whether the process of the existing entity matches the new one.
PUT /process1/activity2
process: 2
some_data: and_stuff
To which the logical response (if successful) is
303 See Other
Location: /process2/activity2
Given the available answers I'm not really satisfied with the proposals.
POST is an all purpose method that should be used if none of the other operations fit the bill. The semantics of a payload received are defined by the service/API only and may therefore a solution for one API but not for most ones. It further lacks the property of idempotency which in case of a network issue will leave the client in an uncertainty whether the request received the server and only the response got lost mid way or if the request failed to reach the server at all. A consecutive request might therefore lead to unexpected results or further actions required.
PUT has the semantics of replace the current representation obtainable from the resource (may be empty) with the representation provided in the payload. Servers are free to modify the received representation to a more fitting one or to append or remove further data. PUT may even have side effects on other resources as well, i.e. if a versioning mechanism for a document update is provided. While providing the above-mentioned idempotency property, PUT actually does not fit the semantics of the requested action. This might have serious implications on the interoperability as standard HTTP servers wont be able to server you correctly.
One might use a combination of POST to create the new representation on the new endpoint first and afterwards remove the old one via DELETE. However, this are two separate operations where the first one might fail and if not handled correctly lead to an immediate deletion of the original resource in worst case. There is no real transactional behavior in these set of operations unfortunately.
Instead of using the above mentioned operations I'd suggest to use PATCH. PATCH is a serious of changes calculated by the client necessary to transform a current representation to a desiered one. A server supporting PATCH will have to apply these instructions atomically. Either all of them are applied or none of them at all. PATCH can have side effects and is thus the most suitable fit to perform a move in HTTP currently. To properly use this method, however, a certain media-types should be used. One might orientate on JSON Patch (more reader-friendly) i.e., though this only defines the semantics of operations to modify state of JSON based representations and does not deal with multiple resources AFAIK.