How can a HTTP client request the server for latest data/to refresh cache? - rest

We're designing a REST service with server-side caching. We'd like to provide an option to the client to specifically ask for latest data even if the cached data has not expired. I'm looking into the HTTP 1.1 spec to see if there exists a standard way to do this, and the Cache Revalidation and Reload Controls appears to fit my need.
Questions:
Should we just use Cache Revalidation and Reload Controls?
If not, is it acceptable to include an If-Modified-Since header with epoch time, causing the server to always consider the resource as have changed? The spec doesn't preclude this, but I'm wondering if I'm abusing :) the intent of the header?
What'd be a good way to identify the resource to refresh? In our case, the URL path alone is not enough, and I'm not sure if query or matrix parameters are considered as part of a unique URL. What about using an ETag?

If your client wants a completely fresh representation of a resource, it may specify max-age=0 to do that. That is actually the intent to receive a response no older than 0 seconds.
All other mechanisms you mentioned (If-Modified-Since, ETag, If-Match, etc.) are all working with caches to make sure the resource is in some state. They work only, if you definitely know you have a valid state of the resource. You can think of it as optimistic locking. You can make conditional requests for when the resource did, or did not change. However you have to know whether you are expecting a change or not.
You could potentially misuse the If-Modified-Since as you say, but max-age communicates your intent better.
Also note, by design there may be multiple caches along the way, not just your server side cache. Most often the client caches also, and there may be other transparent caches on the way.

According to section-5.2.1.4, it appears that no-cache request directive best fits my need.
The "no-cache" request directive indicates that a cache MUST NOT use a
stored response to satisfy the request without successful validation
on the origin server.
Nothing is said about subsequent requests, which is exactly what I want. There is also a no-cache response directive in section-5.2.2.2, but that also applies to subsequent requests.

Related

How to manage HATEOAS links when the server is the client?

I'm learning about HATEOAS. The backend server I'm working on will use a third party REST API that uses HATEOAS. That API has an end point to return the url for each resource and also returns the related resource links with regular requests.
But I'm wondering what's a good way to manage these links on the server to avoid hardcoding them. For example if the third party changes the url of the resource, how will the server detect that change? Are there any standard practices for managing HATEOAS resource links?
Possible ways I can think of
When the server starts, get all the resources urls and cache them. Whenever the third party API needs to be called, reuse these cached urls. Whenever there is a 404 or related error, update the resource url. Or update the url periodically in intervals.
Get the resource url each time before calling the end point. Simplest but essentially doubles the number of requests.
But neither sound like robust ways.
While discovery is generally a good thing and should allow a HATEOAS system to introduce changes in ways that 'hardcoded urls' don't, if urls start breaking arbitrarily I would still consider this a major issue.
You should be able to store urls / links on your side and have some expectation that those keep working.
There are some mechanisms that deal with changes though:
The server should return 301 / 308 redirects if a resource moved. If this were the case, you should also update your references.
The server can emit Sunset or Deprecated headers. See: https://www.rfc-editor.org/rfc/rfc8594
Those are more general answers, but ultimately the existence of best practices does not mean that vendors will abide by them. With that in mind I think your best bet is to try and find out what the deprecation policy is of your vendor and see what they recommend.
Use a cached resource if it is valid, request a refresh when you don't have a local valid copy.
RFC 7234 defines the caching semantics of HTTP.
Ideally, you don't implement the caching rules yourself, but instead you use a general purpose cache.
In its ideal form, your bespoke implementation is talking to a headless browser, and the headless browser worries about the caching rules for you.
In theory, you need the initial URL to start the process, and everything else comes from that.
Each resource you get from the server should include links to other edges on the graph of service for that resource.
So, once you get the initial resource, all of the rest come automatically.
That said, it's not untoward to have "well known" entry points that are, ideally, unchanging URLs. But in the end, those are just "bookmarks", and not necessarily guaranteed end points.
Consider a shopping site such as Amazon. Outside of amazon.com, you don't know any of their URLs. They're all provided on the various forms and pages, and the human simply navigates the site. Those URLs can be changing all the time, and no one would know. With HATEOAS, it's up to the machine to follow the links, rather than a human. But the process of navigation is the same.
As others have mentioned, idea of caching a root resource has merit. Then you rely on the caching headers to direct you to how often you have to refresh the links.
But that said, operationally, there's no difference between following a normal link, and following a cached link. Underneath, the cached resource loads faster, but you still need to "follow the link". Because that's where the caching behavior kicks in. This is different from assuming the link is good, assuming you know the result of a resource lookup. Your application follows the link. Always. The underlying infrastructure is responsible for making it efficient.
So, your code should not, say, load up a root resource, and then stuff a map filled with links, and then assume they're good. Rather, the code should request the root resource, perhaps as a Map of links (datatypes for the win), and let the next layer handle the details. Because it all depends on the type of caching involved. Some have coded durations where no followup is necessary. Others, you make the request anyway, and the server tier responds back "nothing changed", so you can use your local copy, but you're still require to ask in the first place.
Those are implementation details that the SERVER mandates (not the client). It's a server contract. If they want you pinging them each and every time, so be it. That's the contract they're presenting to you and if you want to be a Good Citizen, then you should honor that contact.
Ideally, the server makes good decisions on these kinds of issues for the sake of efficiency, but in the end it's really up to them.
The client has to go along. The client in a HATEOAS system cedes a lot to the server. They're simply not decisions for the client to make.

Should this be a GET or a PATCH request?

if i do a request to get some data from a database without sending any updates, however i'm marking the record in the database to say the data has been fetched, does that make it a PATCH request or a GET?
Short answer: No, it is still a GET.
RFC 7231 defines safe
Request methods are considered "safe" if their defined semantics are essentially read-only....
This definition of safe methods does not prevent an implementation
from including behavior that is potentially harmful, that is not
entirely read-only, or that causes side effects while invoking a safe
method. What is important, however, is that the client did not
request that additional behavior and cannot be held accountable for
it. For example, most servers append request information to access
log files at the completion of every response, regardless of the
method, and that is considered safe even though the log storage might
become full and crash the server. Likewise, a safe request initiated
by selecting an advertisement on the Web will often have the side
effect of charging an advertising account.
So if the client is trying to retrieve a current representation of the resource,
the fact that your implementation happens to do a bit of bookkeeping on the side doesn't change the semantics of the request.
Part of the point of an HTTP front end is that clients are completely insulated from the underlying implementation details of the server -- everything looks like a dumb web site from the outside.
The HTTP spec is rather clear on that if you read through the definition of safe:
Request methods are considered "safe" if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource. Likewise, reasonable use of a safe method is not expected to cause any harm, loss of property, or unusual burden on the origin server.
This definition of safe methods does not prevent an implementation from including behavior that is potentially harmful, that is not entirely read-only, or that causes side effects while invoking a safe method. What is important, however, is that the client did not request that additional behavior and cannot be held accountable for it. For example, most servers append request information to access log files at the completion of every response, regardless of the method, and that is considered safe even though the log storage might become full and crash the server. Likewise, a safe request initiated by selecting an advertisement on the Web will often have the side effect of charging an advertising account.
...
So a state change through a GET triggered download is fine as long as the client is not aware of that state change.
In certain situations though, exposing a state change via GET may be risky. Just think of a crawler that invokes a couple of URIs that order some Pizza or the like. According to the spec this is fine and the crawler must not made accountable for that order. This is simply telling you that it was your fault.
With that being said, you can always use POST if you feel uncomfortable with certain HTTP operations as POST literally allows you to process the request according to the resources own semantics.
Which leads me to the next point of re-thinking your design. Returning some document that includes it own state is somehow strange in my opinion. Usually such information is meta-data about a document but not the resource itself. Here you could either use HTTP headers to communicate such information to the client or design the state of that resource as yet a further resource you can hint a client about providing it a link to look it up if it is interested.
Anyway, while not elegant performing a state change on retrieving a resource via GET is not forbidden. I would though invest a couple more thoughts on whether you want to include the state within the resource itself or expose it via its own resource.

RESTful API - how to include id/token/... in each request?

I developed a mobile app that needs to access and update data on a server. I'd like to include e.g. the device ID and some token in each request.
I am including these in the body at the moment, so I have only POST requests, even when asking to read data from the server. However, a request to read data should be GET, but how do I include these pieces of information? Should I just add a body to a GET request? Should I rather add some headers? If so, can I just create any custom headers with any name? Thank you for your guidance.
Your FCM token and device id are really authentication credentials for the request. In HTTP, you typically use the Authorization header with a scheme to indicate to the service
In your case, you could use bearer tokens in the HTTP Authorization header.
While bearer tokens are often used with JWT token, they are not required to be that specific format.
You could just concatenate the FCM token and the device id like the basic authentication scheme does.
BTW, it's not recommended to use a body on a GET request since some proxies may not retain that.
Well, REST is basically just a generalization of the concepts used already for years in the browser-based web. On applying these concepts consistently in your applications you'll gain freedom to evolve the server side while gaining robustness to changes on the clientside. However, in order to benefit from such strong properties a certain number of constraints need to be followed consequently like adhering to the rules of the underlying transport protocol or relying on HATEOAS to drive application state further. Any out-of-band information needed to interact with the service will lead to a coupling and therefore has the potential to either break clients or prevent servers from changing in future.
A common misconception in REST achitecture design is that URIs should be meaningful and express semantics to the client. However, in a REST architecture the URI is just a pointer to a resource which a client should never parse. The decision whether to invoke the URI should soly be based on the accompanying link relation name which may further be described in either the media-type or common standards. I.e. on a pageable collection link relation like prev, next, first or last may give a client the option to page through the collection. The actual structure of the URI is therefore not important to REST at all. Over-engineered URIs might further lead to typed resources. Therefore I don't like the term restful-url actually. How do non-restful-urls look like then?
While sending everything via POST requests is technically a valid option, it also has some drawbacks to consider though. IANA maintains a list of available HTTP methods you might use. Each method conveys different promisses and semantics. I.e. a client invoking a GET operation on a server should be safe to assume that invoking the resource does not cause any state changes (safe) and in case of network issues the request can be reissued again without any further considerations (idempotent). These are very important benefits to i.e. Web crawlers. Besides that intermediary nodes can determine based on the request method and the resulting response if the response can be cached or not. While this is not necessarily an issue in terms of decoupling clients from servers, it helps to take away unnecessary workload from the server itself, especially when resource state is rarly changing, improving the scalability of the whole system.
POST on the otherhand does not convey such properties. On sending a POST request for retrieving data the client can't be sure if the request actually lead to changes on the resources state or not. On a network issue the request might have reached the server and may have created a new resource though the response just got lost mid way which might keep the client in a state of uncertainty whether it simply can resend the request or not. Also, responses for POST operations are not cacheable by default, only after explicitely adding frehness information to it. A POST method invocation requests the target resource to process the provided representation accoding to the resources own semantics. As literally anything can be sent to the server it is of importance that the server teaches the client on how a request should look like. In HTML i.e. this is done via Web forms where a user can fill in data into certain input fields and then send the data to the server on clicking a submit button. The same concept could be applied for mobile or REST applications as well. Either reusing HTML forms or defining an own application/vnd.company-x.forms+json where the description of that media type is made public (or registered with IANA) can help you on this.
The actual question on where to include certain data is, unfortunately, to generic to give a short answer. It further depends whether data should be shareable or has some security related concerns. While parameters might be passed to the server via URL parameters (query, matrix, path) to a certain extent, it is probably not the best option in general eventhough query parameters are encrypted in SSL interactions. This option, though, is convenient if the URI should be pastable without losing information. This of course then shouldn't contain security related data then. Security related information should almost always be passed in HTTP headers or at least the actual payload itself.
Usually you shoud distinguish between content and meta-data describing the content. While the content should be the actual payload of the request/response, any meta-data describing the content should go inside the headers. Think of an image you want to transfer. As you don't want to mess with the bytes of the image you simply append the image name, the compression format and further properties describing how to convert the bytes back to an image representation within the headers. This discrimination works probably best for standardized representation formats as you need to be within the capabilities of the spec to guarantee interoperability. Though, even there things may start to get fuzzy. I.e in the area of EDI there exist a couple of well-defined standards like Edifact, Tradacoms, and so forth which can be used to exchange different message formats like invoices, orders, order responses, ... though different ERP systems speak different slangs and this is where things start to get complicated and messy.
If you are in control of your representation format as you probably did not standardize it or defined it only vaguely yet things might even be harder to determine whether to put it insight your document or append it via headers. Here it solely depends on your design. I have also seen representations that defined own header sections within the payload and therefore recreated a SOAP like envelop-header-body structure.
About your question if you can create custom header for your requirement. My answer is YES.
As above was mentioned, you can use the standard Authorization header to send the token in each request . Other alternative is defining a custom header. However you will have to implement by server side a logic to support that custom header .
You can read more about it here

http verb to invoke services / methods

what is the best practice in defining web service that represent a non REST command invocation?
For REST, basically we use POST to create new record(s), GET to retrieve record(s), PUT to update record(s) and DELETE to remove record(s). Which http verb should I use if I just want to invoke some other non resource function, for example - to flush a system cache?
Which http verb should I use if I just want to invoke some other non resource function, for example - to flush a system cache?
HTTP request methods should be selected based on their alignment with their defined semantics.
The most important of these is to determine whether or not the semantics are safe
Request methods are considered "safe" if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource. Likewise, reasonable use of a safe method is not expected to cause any harm, loss of property, or unusual burden on the origin server.
Advertising a safe link invites consumers to pre-fetch a link, or to crawl and index the representation found there.
If having Google and a billion of her closest friends flushing your system cache sounds expensive, then you probably don't want a safe method.
PUT and PATCH are unsafe methods with semantics of manipulating representations. So if you had a schema that described a system cache, a client might PUT a representation of an empty cache in the entity body, and send that to you, whereupon you could flush the cache. You could achieve a similar things with PATCH, sending a list of the edits needed to make the change.
Both of these rely on the illusion that your resources are just documents. I GET a representation of your resource, I load that into my generic editor, make changes, send my edited representation back to you, and then it's up to you to manifest those changes (or not).
But they aren't required -- if you want to simply document that
PUT /df1645af-f960-4cc4-ad7a-d0ddd29903f8
Content-Length: 0
has the side effect of flushing the system cache, the REST Police aren't going to come after you just because you've introduced a bit of RPC into the mix.
Of course, if you were doing this with HTML, then your only choice would be POST.
The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics.
Which is to say, POST is always an option.
It's easy enough to imagine the flow -- you load up some bookmark, follow a system cache link, find a form with a flush cache button, and submit. The browser would create the request as described in the form elements and submit it.
So that's going to be fine too. And the REST police won't bother you for that, because that protocol is actually RESTful.
If those answers are unsatisfying, or if you are just surveying the space to know what options are available, you can review the HTTP Method Registry. To be honest, I've never found anything there I've wanted to use. But if WebDAV is your jam....

PUT POST being idempotent (REST)

I don't quite get how the HTTP verbs are defined as idempotent. All I've read is GET and PUT is idempotent. POST is not idempotent. But you could create a REST API using POST that doesn't change anything (in the database for example), or create a REST API for PUT that changes every time it is called.
Sure, that probably is the wrong way to do things but if it can be done, why is PUT labeled as idempotent (or POST as not) when it is up to the implementation? I'm not challenging this idea, I'm probably missing something and I ask to clear my understanding.
EDIT:
I guess one way to put my question is: What would be the problem if I used PUT to make a non-idempotent call and POST to do so?
You are right in pointing out there is nothing inherent within the HTTP protocol that enforces the idempotent attribute of methods/verbs like PUT and DELETE. HTTP, being a stateless protocol, retains no information or status of each request that the user makes; every single request is treated as independent.
To quote Wikipedia on the idempotent attribute of HTTP methods (emphasis mine):
Note that whether a method is idempotent is not enforced by the
protocol or web server. It is perfectly possible to write a web
application in which (for example) a database insert or other
non-idempotent action is triggered by a GET or other request. Ignoring
this recommendation, however, may result in undesirable consequences,
if a user agent assumes that repeating the same request is safe when
it isn't.
So yes, it is possible to deviate from conventional implementation, and rollout things like non-changing POST implementation, non-idempotent PUT etc. probably with no significant, life-threatening technical problems. But you might risk upsetting other programmers consuming your web services, thinking that you don't know what you're doing.
Here's an important quote from RFC2616 on the HTTP methods being safe (emphasis mine):
Implementors should be aware that the software represents the user in
their interactions over the Internet, and should be careful to allow
the user to be aware of any actions they might take which may have an
unexpected significance to themselves or others.
In particular, the convention has been established that the GET and
HEAD methods SHOULD NOT have the significance of taking an action
other than retrieval. These methods ought to be considered "safe".
This allows user agents to represent other methods, such as POST, PUT
and DELETE, in a special way, so that the user is made aware of the
fact that a possibly unsafe action is being requested.
Naturally, it is not possible to ensure that the server does not
generate side-effects as a result of performing a GET request; in
fact, some dynamic resources consider that a feature. The important
distinction here is that the user did not request the side-effects, so
therefore cannot be held accountable for them.
UPDATE: As pointed out by Julian, RFC 2616 has been replaced by RFC 7231. Here's the corresponding section.
So when you publish a web service as a PUT method, and I submit a request that looks like:
PUT /users/<new_id> HTTP/1.1
Host: example.com
I will expect a new user resource to be created. Likewise, if my request looks like:
PUT /users/<existing_id> HTTP/1.1
Host: example.com
I will expect the corresponding existing user to be updated. If I repeat the same request by submitting the form multiple times, please don't pop up a warning dialog (because I like the established convention).
Conversely, as a consumer of a POST web service, I will expect requests like:
POST /users/<existing_id> HTTP/1.1
Host: example.com
to update the corresponding existing user, while a request that looks like:
POST /users/<new_id> HTTP/1.1
Host: example.com
to raise an error because the URL doesn't exist yet.
Indeed, an implementation can do anything it wants. However, if that is incorrect according to the protocol spec, surprising things might happen (such as as library or intermediary repeating a PUT if this first attempt failed).
hope the link helps to you:HTTP Method idempotency
Be careful when dealing with safe methods as well: if a seemingly safe method like GET will change a resource, it might be possible that any middleware client proxy systems between you and the server, will cache this response. Another client who wants to change this resource through the same URL(like: http://example.org/api/article/1234/delete), will not call the server, but return the information directly from the cache. Non-safe (and non-idempotent) methods will never be cached by any middleware proxies.