Sending ACK after consuming the result of a RESTful web service? - rest

Consider a queue of items on server. The client then reads 10 queued items at a time using a REST web service. Naturally, when the client has consumed these items the server should remove them server-side.
Q: What is the best approach if we consider both robustness, network load and restfulness?
I can think of three possible solutions:
The client asks for new items. The server then...
sends item 1..10 (GET) and removes them immediately. Hopefully the items arrived at the client.
sends item 1..10 (GET), client sends ACK for 1..10 (DELETE), and the server removes the items.
sends item 1..10 (GET). Next time the client asks for 11..20 (GET), the previous items are removed on the server.
I believe both #1 and #3 violate the restful principle. E.g. Only the DELETE method may delete objects. However, they both avoid the data traffic for the ACK command.
Not sure what's best here. Perhaps there is an even better solution?

Here's the answer in votable format. I hope it helps clarify your options a bit more.
It's important in a REST-style architecture that the implementation of an API not change the implementation of any underlying protocols -- in this case it means that GET requests should be idempotent. While idempotent does not mean that the underlying resources can't change, or go away forever (AKA be deleted), having that happen as a direct or indirect result of a GET seems to not be in accordance with the spirit of the protocol.
Any system that guarantees delivery of a message requires some kind of handshake to single that the intended receiver successfully received the message -- if HTTP is the protocol in question, then that implies two requests. Even in the case where the behavior of GET were modified to lazily delete the resources, the handshake is still present -- it has just been shifted in time. Again, if HTTP is the protocol in questions, then using the existing methods of GET and then DELETE for the retrieval and deletion for that handshake seems best. The result shouldn't tax the network any more than any equivalent approach.

Related

If we created multiple resource of update request in REST,what will the impact at server side

If we create the multiple resource of update request using POST method in REST.what will be the impact at server side if number of resource created .
I Know using put request ,we can achieve fault tolerance due to idempotence.if we use post instead put,what will happen?
If we created number of resource using post for update , is there any performance issue ?if we created number of resource then what is impact on server ?
In post and put if we call same request n times ,we are going to hit the server n time then creating new resource and same resource should not impact on server.can please confirm this statement right or wrong .
If we create the multiple resource of update request using POST method in REST.what will be the impact at server side if number of resource created .
First of all, HTTP, the de-facto transport layer of REST, is an application protocol for transferring documents over a network and not just YOUR application domain you can run your business rules on. Any business rules you infer from sending data over the network are just a side-effect of the actual documentation management you perform via HTTP. While certain thins might map well from the documentation management to your business layer, certain things might not. I.e. HTTP isn't designed to support larger kinds of batch processing.
By that, even though HTTP itself defines a set of methods you can use, with IANA administering additional ones, the actual implementation depends on the server itself. It should follow the semantics outline in the RFC, though it might not. It may harm interoperability with other clients though in such a case, that is why it is recommended to follow the spec.
What implications or impact a request may have on the server depends on a couple of factors such as the kind of the server, the data that needs to be processed and whether work can be offloaded, i.e. by a cache, as well as the internal infrastructure you use. If you have a server that supports a couple of hundred cores and terabytes of address space a request to be processed might have less of an impact on the server than if you have a server with only a single CPU core and just a gigabyte of RAM, which has to fit in a couple of other applications as well as the OS itself. In general though the actual impact a request has on the server isn't tide to the operation you invoke as at its core HTTP is just a remote document management protocol, as explained before. Certain methods, such as PATCH, may be an exception to this rule though as it clearly demands transaction support as either all or none of the operations defined in the patch document need to be applied.
I Know using put request ,we can achieve fault tolerance due to idempotence.if we use post instead put,what will happen?
RFC 7231 includes a hint on the difference between POST and PUT:
The fundamental difference between the POST and PUT methods is highlighted by the different intent for the enclosed representation. The target resource in a POST request is intended to handle the enclosed representation according to the resource's own semantics, whereas the enclosed representation in a PUT request is defined as replacing the state of the target resource. Hence, the intent of PUT is idempotent and visible to intermediaries, even though the exact effect is only known by the origin server.
POST does not give a client any promises on what happens in case of a network error i.e. You might not know whether a request reached the server and only the response got lost or if the actual request didn't make it to the server at all. Jim Webber gave an example why idempotency is important, especially when you deal with money and currencies.
HTTP is rather specific to inform a client when a resource was created by including a HTTP Location header in the response that contains a URI to the created resource. This works on POST as well as PUT and PATCH. This premise can now be utilized to "safely" upload data. A client can send POST requests to the server until it receives a response with a Location header pointing to the created resource which is then used in the next step to perform a PUT update on that resource with the actual content. This pattern is called the POST-PUT creation pattern and it is especially useful if you have either a large payload to send or have to guarantee that the state only triggers a business rule once, i.e. in case of an online purchase.
Note that with the help of conditional requests some form of optimistic locking could be used as well, though this would require to at least know the state of the current resource beforehand as here a certain value, that is unique to the current state, is included in the request that acts as distributed lock which, if different to the state the server currently has, as there might have been an update by an other client in the meantime, will result in a rejection of the request at the server side.
If we created number of resource using post for update , is there any performance issue ?if we created number of resource then what is impact on server ?
I'm not fully sure what you mean by created a number of resources using post for update. Do you want to create or update a resource via POST? These methods just differ in the semantics they promise. How you map the event of the document modification to trigger certain business rules in your backend is completely up to you. In general though, as mentioned before, HTTP isn't ideal in terms of batch processing.
In post and put if we call same request n times ,we are going to hit the server n time then creating new resource and same resource should not impact on server.can please confirm this statement right or wrong
If you send n requests via POST to the server, the server will perform the logic that should perform on a POST request n times (if all of the requests reached the server). Whether a new resource is created or not depends on the implementation. A POST request might only start a backing process, some kind of calculation or actually doing nothing. If one was created though the response should contain a Location header with the URI that points to the location of the new resource.
In terms of sending n requests via PUT, if the same URI is used for all of these requests, the server in general should apply the payload as the new state of the targeted resource. If it internally results in a DB update or not is an implementation detail that may very from project to project. In general a PUT request does not reflect in the creation of a new resource unless the resource the target URI pointed to didn't exist before, though it also may create further resources as a side-effect. Imagine if you design some kind of version control system. PUT is allowed to have side effects. Such a side effect may be that you perform an update on the HEAD trunk, which applies the new state to the HEAD, though as a side-effect a new resource is created for that commit in the commit history.
So in summary, you can't deduce the impact a request has on a server solely based on the HTTP operation you use as at its heart HTTP is just an application protocol that transfers documents over a network. The actual business rules that get triggered are just a side effect of the actual document management. What impact a request has on the server depends on multiple factors, such as the type of the server you use but also on the length of the request and what you do with it on the server. Each of the available methods has its own semantics and you shouldn't compare them by the impact they might have on the server, but on the premise they give to a client. Certain things like anything related to a balance or money should be done via PUT due to the idempotent property of that method.

REST API that calls another REST API

Is it proper programming practice/ software design to have a REST API call another REST API? If not what would be the recommended way of handling this scenario?
If I understand your question correctly, then YES, it is extremely common.
You are describing the following, I presume:
Client makes API call to Server-1, which in the process of servicing
this request, makes another request to API Server-2, takes the
response from Server-2, does some reformatting or data extraction, and
packages that up to respond back the the Client?
This sort of thing happens all the time. The downside to it, is that unless the connection between Server-1 and Server-2 is very low latency (e.g. they are on the same network), and the bandwidth used is small, then the Client will have to wait quite a while for the response. Obviously there can be caching between the two back-end servers to help mitigate this.
It is pretty much the same as Server-1 making a SQL query to a database in order to answer the request.
An alternative interpretation of your question might be that the Client is asking Server-1 to queue an operation that Server-2 would pick up and execute asynchronously. This also is very common (it's how Google crawls your website, for instance). This scenario would have Server-1 respond to Client immediately without needing to wait for the results of the operation undertaken by Server-2. A message queue or database table is usually used as an intermediary between servers in this case.
Another approach to that is make your REST API(1) store the request details to a queue table. Make a backend that will check that queue table every let's say 100milliseconds. That backend will be the one who will call the other REST API(2).
In your REST API(1) just create a loop that will check if the transaction on queue has been processed. If yes, get the process details and return it to client, if no, just keep on looping until process is done

How to prevent sending same data to different clients in REST api GET?

I have 15 worker clients and one master connected through internet. Job & data are been passed through REST api in json format.
Jobs are not restricted to any particular client. Any worker can query for the available job in regular interval(say 30 seconds), process it and will update the status.
In this scenario, how can I prevent same records been sent to different clients while GET request.
Followings are my solution approach to overcome this issue:
Take top 5 unprocessed records from the database and make it as SENT and expose via REST GET.
But the problem is, it creates inconsistency. Some times, the client doesn't got data due to network connectivity issue. But in server, it will be marked as SENT. So, no other clients can get that data. It will remain as SENT forever.
Get the list from server, and reply back the list of job IDs to Server as received. But in-between this time gap, some other clients also getting same set of Jobs.
You've stumbled upon a fundamental problem in distributed systems: there is no way to know if the other side received your message. You can certainly improve the situation with TCP and ack messages. But if you never get the ACK did the message never arrive, did it arrive but the recipient die before processesing, or did the recipient send he ACK and the ACK get dropped?
That means you need to design your system to handle receiving data more than once.
You offer two partial solutions; if you combine them, your solution starts to look like how SQS works. Mark the item as pending_ack with a timestamp. After client replies, it is marked sent. Any pending_ackss past a certain time period are eligible to be resent.
Pick your time period to allow for slow network and slow clients and it boils down to only sending duplicates when you really don't know if the client died or not.
Maybe you should reconsider the approach to blocking resources. REST architecture - by definition is not obliged to save information about client. Instead, you may want to consider optimistic concurrency control (http://en.wikipedia.org/wiki/Optimistic_concurrency_control).

A RESTful approach to data synchronization

Assume the following scenario A web application serves up resources through a RESTful API. A number of clients consume this API. The goal is to keep the data on the clients synchronized with the web application (in both directions).
The easiest way to do this is to ask the API if any of the resources have changed since the client last synchronized with the API. This means that the client needs to ask the API for the appropriate resources accompanied by timestamp (to see if the data needs to be updated). This seems to me like the approach with the least overhead in terms of needless consumption of bandwidth.
However, I have the feeling that this approach has a few downsides in terms of design and responsibilities. For example, the API shouldn't have to deal with checking whether the resources are out of date. It seems that the only responsibility of the API should be to serve up the resources when asked without having to deal with the updating aspect. By following this second approach, the client would ask for a lot of data every time it wants to update its data to keep it synchronized with the web application. In other words, the client would check whether the data it got back is newer than the locally stored data. If this process takes place every few minutes, this might become a significant burden for the system.
Am I seeing this correctly or is there a middle road that I am overlooking?
This is a pretty common problem, and a RESTful approach can help you solve it. HTTP (the application protocol typically used to build RESTful services) supports a variety of techniques that can be used to keep API clients in sync with the data on the server side.
If the client receives a Last-Modified or E-Tag header in a HTTP response, it may use that information to make conditional GET calls in the future. This allows the server to quickly indicate with a 304 – Not Modified response that the client’s previously stored representation of the resource is still valid and accurate. This will allow the server (or even better, an intermediate proxy or cache server) to be as efficient as possible in how it responds to the client’s requests, potentially reducing costly round-trips to a back-end data store.
If a response contains a Last-Modified header and the client wishes to take advantage of the performance optimization available with it, they must include an If-Modified-Since directive in a subsequent GET call to the same URI, passing in the same timestamp value it received. This instructs the server to only GET the information from the authoritative back-end source if it knows it has changed since that time. Your server will have to be built to support this technique, of course.
A similar principle applies to E-Tag headers. An E-Tag is a simple hash code representing a specific state of the resource at a particular point in time. If the resource changes in any way, so does its E-Tag value. If the client sees an E-Tag in a response it should pass it in subsequent GET requests to the same URI, thereby allowing the server to quickly determine if the client has an up-to-date representation of the resource.
Finally, you should probably look at the long polling technique to reduce the number of repeated GET requests issued by your clients to the server. In essence, the trick is to issue very long GET requests to the server to watch for server data changes. The GET will not return a response until either the data has changed or the very long timeout fires. If the latter, the client just re-issues the same long-lived request to watch for changes again. See also topics like Comet and Web Sockets which are similar in approach.

What is a RESTful way of monitoring a REST resource for changes?

If there is a REST resource that I want to monitor for changes or modifications from other clients, what is the best (and most RESTful) way of doing so?
One idea I've had for doing so is by providing specific resources that will keep the connection open rather than returning immediately if the resource does not (yet) exist. For example, given the resource:
/game/17/playerToMove
a "GET" on this resource might tell me that it's my opponent's turn to move. Rather than continually polling this resource to find out when it's my turn to move, I might note the move number (say 5) and attempt to retrieve the next move:
/game/17/move/5
In a "normal" REST model, it seems a GET request for this URL would return a 404 (not found) error. However, if instead, the server kept the connection open until my opponent played his move, i.e.:
PUT /game/17/move/5
then the server could return the contents that my opponent PUT into that resource. This would both provide me with the data I need, as well as a sort of notification for when my opponent has moved without requiring polling.
Is this sort of scheme RESTful? Or does it violate some sort of REST principle?
Your proposed solution sounds like long polling, which could work really well.
You would request /game/17/move/5 and the server will not send any data, until move 5 has been completed. If the connection drops, or you get a time-out, you simply reconnect until you get a valid response.
The benefit of this is it's very quick - as soon as the server has new data, the client will get it. It's also resilient to dropped connections, and works if the client is disconnected for a while (you could request /game/17/move/5 an hour after it's been moved and get the data instantly, then move onto move/6/ and so on)
The issue with long polling is each "poll" ties up a server thread, which quickly breaks servers like Apache (as it runs out of worker-threads, so can't accept other requests). You need a specialised web-server to serve the long-polling requests.. The Python module twisted (an "an event-driven networking engine") is great for this, but it's more work than regular polling..
In answer to your comment about Jetty/Tomcat, I don't have any experience with Java, but it seems they both use a similar pool-of-worker-threads system to Apache, so it will have that same problem. I did find this post which seems to address exactly this problem (for Tomcat)
I'd suggest a 404, if your intended client is a web browser, as keeping the connection open can actively block browser requests in the client to the same domain. It's up to the client how often to poll.
2021 Edit: The answer above was in 2009, for context.
Today, I would suggest using a WebSocket interface with push notifications.
Alternatively, in the above suggestion, I might suggest holding the connection for 500-1000ms and check twice at the server before returning the 404, to reduce the overhead of creating multiple connections at the client.
I found this article proposing a new HTTP header, "When-Modified-After", that essentially does the same thing--the server waits and keeps the connection open until the resource is modified.
I prefer a version-based approach rather than a timestamp-based approach, since it's less prone to race conditions and gives you a little more information about what it is you're retrieving. Any thoughts to this approach?