How Caching Is Done Under The Hood in REST APIs? - rest

One of the properties of REST APIs is cacheability. I want to understand how caching is done? And is it on client side (i.e. let's say on API client Postman or Insomnia) or on server side or both?
Suppose a resource is accessed as
GET /services/data/{api_version}/{product_tag}/{resource}/{id} and
we get a response.
If we again trigger the same endpoint call almost instantly, we get another response.
Considering API did caching on first call itself, two scenarios:
Data did not change in between two calls. In that case, caching gives correct result.
Data did change between calls. If client relies on cache, stale data is served to user.
How client determines that data changed and serves latest result? Is it something related like setting a dirty bit as we do in operating systems?
I know cache invalidation determination is one of the toughest problems in computer science and depends on scenario, but in general,
What things to cache on client side and what on server side? Caching done by Postman cannot be used by Insomnia.
How to always serve latest data while using cache to its fullest?

Related

If we created multiple resource of update request in REST,what will the impact at server side

If we create the multiple resource of update request using POST method in REST.what will be the impact at server side if number of resource created .
I Know using put request ,we can achieve fault tolerance due to idempotence.if we use post instead put,what will happen?
If we created number of resource using post for update , is there any performance issue ?if we created number of resource then what is impact on server ?
In post and put if we call same request n times ,we are going to hit the server n time then creating new resource and same resource should not impact on server.can please confirm this statement right or wrong .
If we create the multiple resource of update request using POST method in REST.what will be the impact at server side if number of resource created .
First of all, HTTP, the de-facto transport layer of REST, is an application protocol for transferring documents over a network and not just YOUR application domain you can run your business rules on. Any business rules you infer from sending data over the network are just a side-effect of the actual documentation management you perform via HTTP. While certain thins might map well from the documentation management to your business layer, certain things might not. I.e. HTTP isn't designed to support larger kinds of batch processing.
By that, even though HTTP itself defines a set of methods you can use, with IANA administering additional ones, the actual implementation depends on the server itself. It should follow the semantics outline in the RFC, though it might not. It may harm interoperability with other clients though in such a case, that is why it is recommended to follow the spec.
What implications or impact a request may have on the server depends on a couple of factors such as the kind of the server, the data that needs to be processed and whether work can be offloaded, i.e. by a cache, as well as the internal infrastructure you use. If you have a server that supports a couple of hundred cores and terabytes of address space a request to be processed might have less of an impact on the server than if you have a server with only a single CPU core and just a gigabyte of RAM, which has to fit in a couple of other applications as well as the OS itself. In general though the actual impact a request has on the server isn't tide to the operation you invoke as at its core HTTP is just a remote document management protocol, as explained before. Certain methods, such as PATCH, may be an exception to this rule though as it clearly demands transaction support as either all or none of the operations defined in the patch document need to be applied.
I Know using put request ,we can achieve fault tolerance due to idempotence.if we use post instead put,what will happen?
RFC 7231 includes a hint on the difference between POST and PUT:
The fundamental difference between the POST and PUT methods is highlighted by the different intent for the enclosed representation. The target resource in a POST request is intended to handle the enclosed representation according to the resource's own semantics, whereas the enclosed representation in a PUT request is defined as replacing the state of the target resource. Hence, the intent of PUT is idempotent and visible to intermediaries, even though the exact effect is only known by the origin server.
POST does not give a client any promises on what happens in case of a network error i.e. You might not know whether a request reached the server and only the response got lost or if the actual request didn't make it to the server at all. Jim Webber gave an example why idempotency is important, especially when you deal with money and currencies.
HTTP is rather specific to inform a client when a resource was created by including a HTTP Location header in the response that contains a URI to the created resource. This works on POST as well as PUT and PATCH. This premise can now be utilized to "safely" upload data. A client can send POST requests to the server until it receives a response with a Location header pointing to the created resource which is then used in the next step to perform a PUT update on that resource with the actual content. This pattern is called the POST-PUT creation pattern and it is especially useful if you have either a large payload to send or have to guarantee that the state only triggers a business rule once, i.e. in case of an online purchase.
Note that with the help of conditional requests some form of optimistic locking could be used as well, though this would require to at least know the state of the current resource beforehand as here a certain value, that is unique to the current state, is included in the request that acts as distributed lock which, if different to the state the server currently has, as there might have been an update by an other client in the meantime, will result in a rejection of the request at the server side.
If we created number of resource using post for update , is there any performance issue ?if we created number of resource then what is impact on server ?
I'm not fully sure what you mean by created a number of resources using post for update. Do you want to create or update a resource via POST? These methods just differ in the semantics they promise. How you map the event of the document modification to trigger certain business rules in your backend is completely up to you. In general though, as mentioned before, HTTP isn't ideal in terms of batch processing.
In post and put if we call same request n times ,we are going to hit the server n time then creating new resource and same resource should not impact on server.can please confirm this statement right or wrong
If you send n requests via POST to the server, the server will perform the logic that should perform on a POST request n times (if all of the requests reached the server). Whether a new resource is created or not depends on the implementation. A POST request might only start a backing process, some kind of calculation or actually doing nothing. If one was created though the response should contain a Location header with the URI that points to the location of the new resource.
In terms of sending n requests via PUT, if the same URI is used for all of these requests, the server in general should apply the payload as the new state of the targeted resource. If it internally results in a DB update or not is an implementation detail that may very from project to project. In general a PUT request does not reflect in the creation of a new resource unless the resource the target URI pointed to didn't exist before, though it also may create further resources as a side-effect. Imagine if you design some kind of version control system. PUT is allowed to have side effects. Such a side effect may be that you perform an update on the HEAD trunk, which applies the new state to the HEAD, though as a side-effect a new resource is created for that commit in the commit history.
So in summary, you can't deduce the impact a request has on a server solely based on the HTTP operation you use as at its heart HTTP is just an application protocol that transfers documents over a network. The actual business rules that get triggered are just a side effect of the actual document management. What impact a request has on the server depends on multiple factors, such as the type of the server you use but also on the length of the request and what you do with it on the server. Each of the available methods has its own semantics and you shouldn't compare them by the impact they might have on the server, but on the premise they give to a client. Certain things like anything related to a balance or money should be done via PUT due to the idempotent property of that method.

RESTful web requests and user activity tracking websites

Someone asked me this question a couple of days ago and I didn't have an answer:
As HTTP is a stateless protocol. When we open www.google.com, can it
be called a REST call?
What I think:
When we make a search on google.com all info is passed through cookie and URL parameters. It looks like a stateless request.
But the search results are not independent of user's past request. Search results are specific to user interest and behavior. Now, it doesn't look like a stateless request.
I know this is an old question and I have read many SO answers like Why HTTP is a stateless protocol? but I am still not able to understand what happens when user activity is tracked like on google or Amazon(recommendations based on past purchases) or any other user activity based recommendation websites.
Is it RESTful or is it RESTless?
What if I want to create a web app in which I use REST architecture and still provide user-specific responses?
HTTP is stateless, however the Google Application Layer is not. The specific Cookies and their meaning is part of the Application Layer.
Consider the same with TCP/IP. IP is a stateless protocol, but TCP isn't. The existence of state in TCP embedded in IP packets does not mean that IP protocol itself has a state.
So does that make it a REST call? No.
Although HTTP is stateless & I would suspect that www.google.com when requested with cookies disabled, the results would be the same for each request, making it almost stateless (Google still probably tracks IP to limit request frequency).
But the Application Layer is not stateless. One of the principles of REST is that the system does not retain state data about about the client between requests for the purpose of modifying the responses. In the case of Google, that clearly is not happening.
It seems that the meaning of "stateless" is being (hypothetically) taken beyond its practical expression.
Consider a web system with no DB at all. You call a (RESTful) API, you always get the exactly the same results. This is perfectly stateless... But this is perfectly not a real system, either.
A real system, in practically every implementation, holds data. Moreover, that data is the "resources" that RESTful API allows us to access. Of course, data changes, due to API calls as well. So, if you get a resource's value, change its value, and then get its value again, you will get a different value than the first read; however, this clearly does not say that the reads themselves were not stateless. They are stateless in the sense that they represent the very same action (or, more exact, resource) for each call. Change has to be manually done, using another RESTful API, to change the resource value, that will then be reflected in the next call.
However, what will be the case if we have a resource that changes without a manual, standard API verb?
For example, suppose that we have a resource that counts the number of times some other resource was accessed. Or some other resource that is being populated from some other third party data. Clearly, this is still a stateless protocol.
Moreover, in some sense, almost any system -- say, any system that includes an authentication mechanism -- responds differently for the same API calls, depending, for example, on the user's privileges. And yet, clearly, RESTful systems are not forbidden to authenticate their users...
In short, stateless systems are stateless for the sake of that protocol. If Google tracks the calls so that if I call the same resource in the same session I will get different answers, then it breaks the stateless requirement. But so long as the returned response is different due to application level data, and are not session related, this requirement is not broken.
AFAIK, what Google does is not necessarily related to sessions. If the same user will run the same search under completely identical conditions (e.g., IP, geographical location, OS, browser, etc.), they will get the very same response. If a new identical search will produce different results due to what Google have "learnt" in the last call, it is still stateless, because -- again -- that second call would have produced the very same result if it was done in another session but under identical conditions.
You should probably start from Fielding's comments on cookies in his thesis, and then review Fielding's further thoughts, published on rest-discuss.
My interpretation of Fielding's thoughts, applied to this question: no, it's not REST. The search results change depending on the state of the cookie header in the request, which is to say that the representation of the resource changes depending on the cookie, which is to say that part of the resource's identifier is captured in the cookie header.
Most of the problems with cookies are due to breaking visibility,
which impacts caching and the hypertext application engine -- Fielding, 2003
As it happens, caching doesn't seem to be a big priority for Google; the representation returned to be included a cache control private header, which restricts the participation by intermediate components.

HATEOAS and server state

Trying to understand REST HATEOAS:
Suppose I have a service that has state; they are: initial, ready, running. I have a client that connects to the service, obtains a page with links that allow it to mutate the service state.
It uses one of the links to change the service's state and obtains another page with new links.
As long as there is 1 client, the state the client holds is identical with the service. But if there is a second client and it changes the service's state, the first client's representation is stale.
How is this resolved in HATEOAS? From what I've read it seems that REST is not applicable and I should maybe look at something else. If so, what?
Thanks!
This is not resolved by HATEOS (entirely). As REST is stateless this is kind of a paradox use case to keep state in client and server aligned.
Assuming I understand your requirements, yes, you're state in client 1 is stale and not the same as the one on the server. But what if the client would make a periodic call to the server to see whether some other client changed it? If so, with HATEOAS you could provide a link to serve the current state and omit the link to change the state.
#Kay - Thanks for answering.
I'm going to try to answer my own question. I realized after reading your answer that the "application" in HATEOAS is really the virtual application the client experiences when it retrieves and processes the resources it gets from the server. Its states are the pages (resources) it transitions between. The server (service?) may have its own state but it's not the same as the client's.
As long as this distinction is kept in mind, it is not unRESTful to have stale links in client 1. The server simply responds with new links reflecting its own state. And the client makes new transitions based on the updated links.
Still trying to understand. If I have it wrong, I'd appreciate some help.
Thanks!
The stateless requirement of REST refers to the ability of the server to understand and process the client request independent from any previous interactions it has made with said client. In other words, the client should be able to send a request "out of the blue" to the server (I.e., without a session saved on said server) and have the request processed. Hence there isn't a concept of login and logout in a purely RESTful architecture.
That's a different constraint than HATEOAS. Basically, "hypermedia as the engine of application state" means that all state is conveyed through the media type being used and not the connection itself. The client can (and often does) keep its own state, and can request snapshots of the state of resources from the server through resource representations (a.k.a media types).
If you want to be notified when a resource changes state, REST is (probably) the wrong choice. You'd likely want to use a different application protocol than, say, HTTP.
As Fielding says: it's not REST without HATEOAS. Don't call your service REST if it's ignore HATEAOS and stateful service can not be REST. You understood HATEOA. The server provides hyperlinks for the client which should be use to change the state located at the CLIENT SIDE.
To solve your problem: omit tend server from any state information. It will easier your life. Then implement REST as using the Richardson Maturity Model while consider information's from here.

A RESTful approach to data synchronization

Assume the following scenario A web application serves up resources through a RESTful API. A number of clients consume this API. The goal is to keep the data on the clients synchronized with the web application (in both directions).
The easiest way to do this is to ask the API if any of the resources have changed since the client last synchronized with the API. This means that the client needs to ask the API for the appropriate resources accompanied by timestamp (to see if the data needs to be updated). This seems to me like the approach with the least overhead in terms of needless consumption of bandwidth.
However, I have the feeling that this approach has a few downsides in terms of design and responsibilities. For example, the API shouldn't have to deal with checking whether the resources are out of date. It seems that the only responsibility of the API should be to serve up the resources when asked without having to deal with the updating aspect. By following this second approach, the client would ask for a lot of data every time it wants to update its data to keep it synchronized with the web application. In other words, the client would check whether the data it got back is newer than the locally stored data. If this process takes place every few minutes, this might become a significant burden for the system.
Am I seeing this correctly or is there a middle road that I am overlooking?
This is a pretty common problem, and a RESTful approach can help you solve it. HTTP (the application protocol typically used to build RESTful services) supports a variety of techniques that can be used to keep API clients in sync with the data on the server side.
If the client receives a Last-Modified or E-Tag header in a HTTP response, it may use that information to make conditional GET calls in the future. This allows the server to quickly indicate with a 304 – Not Modified response that the client’s previously stored representation of the resource is still valid and accurate. This will allow the server (or even better, an intermediate proxy or cache server) to be as efficient as possible in how it responds to the client’s requests, potentially reducing costly round-trips to a back-end data store.
If a response contains a Last-Modified header and the client wishes to take advantage of the performance optimization available with it, they must include an If-Modified-Since directive in a subsequent GET call to the same URI, passing in the same timestamp value it received. This instructs the server to only GET the information from the authoritative back-end source if it knows it has changed since that time. Your server will have to be built to support this technique, of course.
A similar principle applies to E-Tag headers. An E-Tag is a simple hash code representing a specific state of the resource at a particular point in time. If the resource changes in any way, so does its E-Tag value. If the client sees an E-Tag in a response it should pass it in subsequent GET requests to the same URI, thereby allowing the server to quickly determine if the client has an up-to-date representation of the resource.
Finally, you should probably look at the long polling technique to reduce the number of repeated GET requests issued by your clients to the server. In essence, the trick is to issue very long GET requests to the server to watch for server data changes. The GET will not return a response until either the data has changed or the very long timeout fires. If the latter, the client just re-issues the same long-lived request to watch for changes again. See also topics like Comet and Web Sockets which are similar in approach.

How often does RESTful client pull server data

I have a RESTful web-service application that I developed using the Netbeans IDE. The application uses MySQL server as its back end server. What I am wondering now is how often a client application that uses my RESTful application would refresh to reflect the data change in the server.
Are there any default pull intervals that clients get from the RESTful application? Does the framework(JAX-RS) do something about it Or is that my business to take care of.
Thanks in advance
#Abraham
There are no such rules. Only thing you can use for properly implementing this is HTTP's caching capabilities. Service must include control information how long representation of a particular resource can be cached, revalidated, never cached etc...
On client application side of things each client may decide it's own path how it will keep itself in sync with a service. It can be done by locally storing data and serve end user from local cache etc... Service can not(and shouldn't know) how clients are implemented, only thing service can do is to include caching information in response messages as i already mentioned above.
It is your responsibility to schedule the service to execute again and again. We can set time out interval but there is no pull interval.