Are REST Webservices stateful? - rest

We are talking about REST Webservices in my university right now.
There came the terms "Application" and "Interaction" up.
I totally understand Interaction, but not quite Application.
Does anybody know what Application means in the context of REST Webservices?
There came the thesis "REST Webservices" are stateful up.
This was explained with self describing messages.
The argument was, that in the self describing messages the required data format is described.
I do not find any proof of that on the Internet, rather the opposite.
Can anybody explain both terms and explain to me why REST Webservices should be stateful?
Cheers,
Henrik

Heuristic: anytime you are thinking about REST, consider the web as an example. How does it work with web pages and browsers?
State:
Resources (web pages) are stateful. The representations (bytes) can change over time.
Messages (http requests and responses) are stateless - you don't need to know anything about "session" or "context" to understand the message. Alternative spelling - the meaning of a request is completely independent of any previous request.
Fielding discusses applications and application state when discussing the REST data view
... looking-up a word in an on-line dictionary is one application, as is touring through a virtual museum, or reviewing a set of class notes to study for an exam. Each application defines goals for the underlying system, against which the system's performance can be measured....
An application's state is therefore defined by its pending requests, the topology of connected components (some of which may be filtering buffered data), the active requests on those connectors, the data flow of representations in response to those requests, and the processing of those representations as they are received by the user agent.
An application reaches a steady-state whenever it has no outstanding requests; i.e., it has no pending requests and all of the responses to its current set of requests have been completely received or received to the point where they can be treated as a representation data stream. For a browser application, this state corresponds to a "web page," including the primary representation and ancillary representations, such as in-line images, embedded applets, and style sheets. Fielding, 2000

Related

What can go wrong if we do NOT follow RESTful best practices?

TL;DR : scroll down to the last paragraph.
There is a lot of talk about best practices when defining RESTful APIs: what HTTP methods to support, which HTTP method to use in each case, which HTTP status code to return, when to pass parameters in the query string vs. in the path vs. in the content body vs. in the headers, how to do versioning, result set limiting, pagination, etc.
If you are already determined to make use of best practices, there are lots of questions and answers out there about what is the best practice for doing any given thing. Unfortunately, there appears to be no question (nor answer) as to why use best practices in the first place.
Most of the best practice guidelines direct developers to follow the principle of least surprise, which, under normal circumstances, would be a good enough reason to follow them. Unfortunately, REST-over-HTTP is a capricious standard, the best practices of which are impossible to implement without becoming intimately involved with it, and the drawback of intimate involvement is that you tend to end up with your application being very tightly bound to a particular transport mechanism. So, some people (like me) are debating whether the benefit of "least surprise" justifies the drawback of littering the application with REST-over-HTTP concerns.
A different approach examined as an alternative to best practices suggests that our involvement with HTTP should be limited to the bare minimum necessary in order to get an application-defined payload from point A to point B. According to this approach, you only use a single REST entry point URL in your entire application, you never use any HTTP method other than HTTP POST, never return any HTTP status code other than HTTP 200 OK, and never pass any parameter in any way other than within the application-specific payload of the request. The request will either fail to be delivered, in which case it is the responsibility of the web server to return an "HTTP 404 Not Found" to the client, or it will be successfully delivered, in which case the delivery of the request was "HTTP 200 OK" as far as the transport protocol is concerned, and anything else that might go wrong from that point on is exclusively an application concern, and none of the transport protocol's business. Obviously, this approach is kind of like saying "let me show you where to stick your best practices".
Now, there are other voices that say that things are not that simple, and that if you do not follow the RESTful best practices, things will break.
The story goes that for example, in the event of unauthorized access, you should return an actual "HTTP 401 Unauthorized" (instead of a successful response containing a json-serialized UnauthorizedException) because upon receiving the 401, the browser will prompt the user of credentials. Of course this does not really hold any water, because REST requests are not issued by browsers being used by human users.
Another, more sophisticated way the story goes is that usually, between the client and the server exist proxies, and these proxies inspect HTTP requests and responses, and try to make sense out of them, so as to handle different requests differently. For example, they say, somewhere between the client and the server there may be a caching proxy, which may treat all requests to the exact same URL as identical and therefore cacheable. So, path parameters are necessary to differentiate between different resources, otherwise the caching proxy might only ever forward a request to the server once, and return cached responses to all clients thereafter. Furthermore, this caching proxy may need to know that a certain request-response exchange resulted in a failure due to a particular error such as "Permission Denied", so as to again not cache the response, otherwise a request resulting in a temporary error may be answered with a cached error response forever.
So, my questions are:
Besides "familiarity" and "least surprise", what other good reasons are there for following REST best practices? Are these concerns about proxies real? Are caching proxies really so dumb as to cache REST responses? Is it hard to configure the proxies to behave in less dumb ways? Are there drawbacks in configuring the proxies to behave in less dumb ways?
It's worth considering that what you're suggesting is the way that HTTP APIs used to be designed for a good 15 years or so. API designers are tending to move away from that approach these days. They really do have their reasons.
Some points to consider if you want to avoid using ReST over HTTP:
ReST over HTTP is an efficient use of the HTTP/S transport mechanism. Avoiding the ReST paradigm runs the risk of every request / response being wrapped in verbose envelopes. SOAP is an example of this.
ReST encourages client and server decoupling by putting application semantics into standard mechanisms - HTTP and XML/JSON (or others data formats). These protocols and standards are well supported by standard libraries and have been built up over years of experience. Sure, you can create your own 'unauthorized' response body with a 200 status code, but ReST frameworks just make it unnecessary so why bother?
ReST is a design approach which encourages a view of your distributed system which focuses on data rather than functionality, and this has a proven a useful mechanism for building distributed systems. Avoiding ReST runs the risk of focusing on very RPC-like mechanisms which have some risks of their own:
they can become very fine-grained and 'chatty'
which can be an inefficient use of network bandwidth
which can tightly couple client and server, through introducing stateful-ness and temporal coupling beteween requests.
and can be difficult to scale horizontally
Note: there are times when an RPC approach is actually a better way of breaking down a distributed system than a resource-oriented approach, but they tend to be the exceptions rather than the rule.
existing tools for developers make debugging / investigations of ReSTful APIs easier. It's easy to use a browser to do a simple GET, for example. And tools such as Postman or RestClient already exist for more complex ReST-style queries. In extreme situations tcpdump is very useful, as are browser debugging tools such as firebug. If every API call has application layer semantics built on top of HTTP (e.g. special response types for particular error situations) then you immediately lose some value from some of this tooling. Building SOAP envelopes in PostMan is a pain. As is reading SOAP response envelopes.
network infrastructure around caching really can be as dumb as you're asking. It's possible to get around this but you really do have to think about it and it will inevitably involve increased network traffic in some situations where it's unnecessary. And caching responses for repeated queries is one way in which APIs scale out, so you'll likely need to 'solve' the problem yourself (i.e. reinvent the wheel) of how to cache repeated queries.
Having said all that, if you want to look into a pure message-passing design for your distributed system rather than a ReSTful one, why consider HTTP at all? Why not simply use some message-oriented middleware (e.g. RabbitMQ) to build your application, possibly with some sort of HTTP bridge somewhere for Internet-based clients? Using HTTP as a pure transport mechanism involving a simple 'message accepted / not accepted' semantics seems overkill.
REST is intended for long-lived network-based applications that span multiple organizations. If you don’t see a need for the constraints, then don’t use them. -- Roy T Fielding
Unfortunately, there appears to be no question (nor answer) as to why use best practices in the first place.
When in doubt, go back to the source
Fielding's dissertation really does quite a good job at explaining how the REST architectural constraints ensure that you don't destroy the properties those constraints are designed to protect.
Keep in mind - before the web (which is the reference application for REST), "web scale" wasn't a thing; the notion of a generic client (the browers) that could discover and consume thousands of customized applications (provided by web servers) had not previously been realized.
According to this approach, you only use a single REST entry point URL in your entire application, you never use any HTTP method other than HTTP POST, never return any HTTP status code other than HTTP 200 OK, and never pass any parameter in any way other than within the application-specific payload of the request.
Yup - that's a thing, it's called RPC; you are effectively taking the web, and stripping it down to a bare message transport application that just happens to tunnel through port 80.
In doing so, you have stripped away the uniform interface -- you've lost the ability to use commodity parts in your deployment, because nobody can participate in the conversation unless they share the same interpretation of the message data.
Note: that's doesn't at all imply that RPC is "broken"; architecture is about tradeoffs. The RPC approach gives up some of the value derived from the properties guarded by REST, but that doesn't mean it doesn't pick up value somewhere else. Horses for courses.
Besides "familiarity" and "least surprise", what other good reasons are there for following REST best practices?
Cheap scaling of reads - as your offering becomes more popular, you can service more clients by installing a farm of commodity reverse-proxies that will serve cached representations where available, and only put load on the server when no fresh representation is available.
Prefetching - if you are adhering to the safety provisions of the interface, agents (and intermediaries) know that they can download representations at their own discretion without concern that the operators will be liable for loss of capital. AKA - your resources can be crawled (and cached)
Similarly, use of idempotent methods (where appropriate) communicates to agents (and intermediaries) that retrying the send of an unacknowledged message causes no harm (for instance, in the event of a network outage).
Independent innovation of clients and servers, especially cross organizations. Mosaic is a museum piece, Netscape vanished long ago, but the web is still going strong.
Of course this does not really hold any water, because REST requests are not issued by browsers being used by human users.
Of course they are -- where do you think you are reading this answer?
So far, REST works really well at exposing capabilities to human agents; which is to say that the server side is so ubiquitous at this point that we hardly think about it any more. The notion that you -- the human operator -- can use the same application to order pizza, run diagnostics on your house, and remote start your car is as normal as air.
But you are absolutely right that replacing the human still seems a long ways off; there are various standards and media types for communicating semantic content of data -- the automated client can look at markup, identify a phone number element, and provide a customized array of menu options from it -- but building into agents the sorts of fuzzy intelligence needed to align offered capabilities with goals, or to recover from error conditions, seems to be a ways off.

RESTful web requests and user activity tracking websites

Someone asked me this question a couple of days ago and I didn't have an answer:
As HTTP is a stateless protocol. When we open www.google.com, can it
be called a REST call?
What I think:
When we make a search on google.com all info is passed through cookie and URL parameters. It looks like a stateless request.
But the search results are not independent of user's past request. Search results are specific to user interest and behavior. Now, it doesn't look like a stateless request.
I know this is an old question and I have read many SO answers like Why HTTP is a stateless protocol? but I am still not able to understand what happens when user activity is tracked like on google or Amazon(recommendations based on past purchases) or any other user activity based recommendation websites.
Is it RESTful or is it RESTless?
What if I want to create a web app in which I use REST architecture and still provide user-specific responses?
HTTP is stateless, however the Google Application Layer is not. The specific Cookies and their meaning is part of the Application Layer.
Consider the same with TCP/IP. IP is a stateless protocol, but TCP isn't. The existence of state in TCP embedded in IP packets does not mean that IP protocol itself has a state.
So does that make it a REST call? No.
Although HTTP is stateless & I would suspect that www.google.com when requested with cookies disabled, the results would be the same for each request, making it almost stateless (Google still probably tracks IP to limit request frequency).
But the Application Layer is not stateless. One of the principles of REST is that the system does not retain state data about about the client between requests for the purpose of modifying the responses. In the case of Google, that clearly is not happening.
It seems that the meaning of "stateless" is being (hypothetically) taken beyond its practical expression.
Consider a web system with no DB at all. You call a (RESTful) API, you always get the exactly the same results. This is perfectly stateless... But this is perfectly not a real system, either.
A real system, in practically every implementation, holds data. Moreover, that data is the "resources" that RESTful API allows us to access. Of course, data changes, due to API calls as well. So, if you get a resource's value, change its value, and then get its value again, you will get a different value than the first read; however, this clearly does not say that the reads themselves were not stateless. They are stateless in the sense that they represent the very same action (or, more exact, resource) for each call. Change has to be manually done, using another RESTful API, to change the resource value, that will then be reflected in the next call.
However, what will be the case if we have a resource that changes without a manual, standard API verb?
For example, suppose that we have a resource that counts the number of times some other resource was accessed. Or some other resource that is being populated from some other third party data. Clearly, this is still a stateless protocol.
Moreover, in some sense, almost any system -- say, any system that includes an authentication mechanism -- responds differently for the same API calls, depending, for example, on the user's privileges. And yet, clearly, RESTful systems are not forbidden to authenticate their users...
In short, stateless systems are stateless for the sake of that protocol. If Google tracks the calls so that if I call the same resource in the same session I will get different answers, then it breaks the stateless requirement. But so long as the returned response is different due to application level data, and are not session related, this requirement is not broken.
AFAIK, what Google does is not necessarily related to sessions. If the same user will run the same search under completely identical conditions (e.g., IP, geographical location, OS, browser, etc.), they will get the very same response. If a new identical search will produce different results due to what Google have "learnt" in the last call, it is still stateless, because -- again -- that second call would have produced the very same result if it was done in another session but under identical conditions.
You should probably start from Fielding's comments on cookies in his thesis, and then review Fielding's further thoughts, published on rest-discuss.
My interpretation of Fielding's thoughts, applied to this question: no, it's not REST. The search results change depending on the state of the cookie header in the request, which is to say that the representation of the resource changes depending on the cookie, which is to say that part of the resource's identifier is captured in the cookie header.
Most of the problems with cookies are due to breaking visibility,
which impacts caching and the hypertext application engine -- Fielding, 2003
As it happens, caching doesn't seem to be a big priority for Google; the representation returned to be included a cache control private header, which restricts the participation by intermediate components.

HATEOAS and server state

Trying to understand REST HATEOAS:
Suppose I have a service that has state; they are: initial, ready, running. I have a client that connects to the service, obtains a page with links that allow it to mutate the service state.
It uses one of the links to change the service's state and obtains another page with new links.
As long as there is 1 client, the state the client holds is identical with the service. But if there is a second client and it changes the service's state, the first client's representation is stale.
How is this resolved in HATEOAS? From what I've read it seems that REST is not applicable and I should maybe look at something else. If so, what?
Thanks!
This is not resolved by HATEOS (entirely). As REST is stateless this is kind of a paradox use case to keep state in client and server aligned.
Assuming I understand your requirements, yes, you're state in client 1 is stale and not the same as the one on the server. But what if the client would make a periodic call to the server to see whether some other client changed it? If so, with HATEOAS you could provide a link to serve the current state and omit the link to change the state.
#Kay - Thanks for answering.
I'm going to try to answer my own question. I realized after reading your answer that the "application" in HATEOAS is really the virtual application the client experiences when it retrieves and processes the resources it gets from the server. Its states are the pages (resources) it transitions between. The server (service?) may have its own state but it's not the same as the client's.
As long as this distinction is kept in mind, it is not unRESTful to have stale links in client 1. The server simply responds with new links reflecting its own state. And the client makes new transitions based on the updated links.
Still trying to understand. If I have it wrong, I'd appreciate some help.
Thanks!
The stateless requirement of REST refers to the ability of the server to understand and process the client request independent from any previous interactions it has made with said client. In other words, the client should be able to send a request "out of the blue" to the server (I.e., without a session saved on said server) and have the request processed. Hence there isn't a concept of login and logout in a purely RESTful architecture.
That's a different constraint than HATEOAS. Basically, "hypermedia as the engine of application state" means that all state is conveyed through the media type being used and not the connection itself. The client can (and often does) keep its own state, and can request snapshots of the state of resources from the server through resource representations (a.k.a media types).
If you want to be notified when a resource changes state, REST is (probably) the wrong choice. You'd likely want to use a different application protocol than, say, HTTP.
As Fielding says: it's not REST without HATEOAS. Don't call your service REST if it's ignore HATEAOS and stateful service can not be REST. You understood HATEOA. The server provides hyperlinks for the client which should be use to change the state located at the CLIENT SIDE.
To solve your problem: omit tend server from any state information. It will easier your life. Then implement REST as using the Richardson Maturity Model while consider information's from here.

A RESTful approach to data synchronization

Assume the following scenario A web application serves up resources through a RESTful API. A number of clients consume this API. The goal is to keep the data on the clients synchronized with the web application (in both directions).
The easiest way to do this is to ask the API if any of the resources have changed since the client last synchronized with the API. This means that the client needs to ask the API for the appropriate resources accompanied by timestamp (to see if the data needs to be updated). This seems to me like the approach with the least overhead in terms of needless consumption of bandwidth.
However, I have the feeling that this approach has a few downsides in terms of design and responsibilities. For example, the API shouldn't have to deal with checking whether the resources are out of date. It seems that the only responsibility of the API should be to serve up the resources when asked without having to deal with the updating aspect. By following this second approach, the client would ask for a lot of data every time it wants to update its data to keep it synchronized with the web application. In other words, the client would check whether the data it got back is newer than the locally stored data. If this process takes place every few minutes, this might become a significant burden for the system.
Am I seeing this correctly or is there a middle road that I am overlooking?
This is a pretty common problem, and a RESTful approach can help you solve it. HTTP (the application protocol typically used to build RESTful services) supports a variety of techniques that can be used to keep API clients in sync with the data on the server side.
If the client receives a Last-Modified or E-Tag header in a HTTP response, it may use that information to make conditional GET calls in the future. This allows the server to quickly indicate with a 304 – Not Modified response that the client’s previously stored representation of the resource is still valid and accurate. This will allow the server (or even better, an intermediate proxy or cache server) to be as efficient as possible in how it responds to the client’s requests, potentially reducing costly round-trips to a back-end data store.
If a response contains a Last-Modified header and the client wishes to take advantage of the performance optimization available with it, they must include an If-Modified-Since directive in a subsequent GET call to the same URI, passing in the same timestamp value it received. This instructs the server to only GET the information from the authoritative back-end source if it knows it has changed since that time. Your server will have to be built to support this technique, of course.
A similar principle applies to E-Tag headers. An E-Tag is a simple hash code representing a specific state of the resource at a particular point in time. If the resource changes in any way, so does its E-Tag value. If the client sees an E-Tag in a response it should pass it in subsequent GET requests to the same URI, thereby allowing the server to quickly determine if the client has an up-to-date representation of the resource.
Finally, you should probably look at the long polling technique to reduce the number of repeated GET requests issued by your clients to the server. In essence, the trick is to issue very long GET requests to the server to watch for server data changes. The GET will not return a response until either the data has changed or the very long timeout fires. If the latter, the client just re-issues the same long-lived request to watch for changes again. See also topics like Comet and Web Sockets which are similar in approach.

Event Based interaction style in REST

I am currently struggling with a design issue involving REST. The application I am designing has a requirement to send out events and also support pub/sub style of interaction. I am unable to come up with a design to provide these interaction style without breaking the REST’s “Stateless interaction” constraint. I am not against polling as some people seem to be (polling sucks) , but my application demands a event based and pub/sub style of interaction (polling is not an option for me). So, my question is:
Can I design a RESTful application that supports event based and pub/sub interactions without breaking the REST contraint?
Is REST style suitable for this kind of interaction style?
I'd recommend the Distributed Observer Pattern by Duncan Cragg as a good read (bit difficult to grok but worth the effort).
As others have indicated its likely you'll need to use polling but as you rightly say subscribers could register their own interest (POST to create subscription). If you view the subscription as its own resource, a contract between the publisher and subscriber, then I wouldn't view it as a breaking REST constraints (see State and Statelessness at page 217 of RESTful Web Services for the difference between application and resource state)
I assume you mean the server should notify the clients about events. I don't see how the specific technology matters here: you will face the same problems, and have to pick a solution from the same pool, regardless of using REST, SOAP-based web services, or any other alternative.
The basic question is, can your server initiate connections? Complementing this, can the clients listen to a port? If so, the client registers (sub), and the server notifies of events (pub). Both the registration operation and the notification events can be RESTful.
You need both server-initiated connections and listening clients. If either is not an option (e.g., because the client is a web browser), you will have to make do with polling (you can also look into something like websockets, if you're dealing with a browser). Design your polling carefully: the server response to the polling event should indicate a minimum delay before the client may poll again. The initial implementation of the server can return a constant for this delay value, but later on (assuming the clients are well-behaved) this will allow you to control the load on the server, differentiate between critical and less-critical clients, and so on.
And of course, the polling can be RESTful.
I don't see a reason why RESTful interfaces should not support events.
It will have to be done through polling, mind you; and that would be true even if you were to use SOAP instead.
While your web servers should definitely remain stateless, you probably do have a DB somewhere on the back end. You can use this DB to handle subscriptions to events by adding a subscription table.
Webhooks are the answer to this problem. They allow events without violating the REST principles.
Just a fast check on the REST constraints:
client-server architecture
stateless
cache
uniform interface
identification of resource
manipulation of resource through representations
self-desriptive messages
hypermedia of the engine of application state
layered system
code on demand (optional)
From the Fielding dissertation:
The client-server style is the most frequently encountered of the
architectural styles for network-based applications. A server
component, offering a set of services, listens for requests upon those
services. A client component, desiring that a service be performed,
sends a request to the server via a connector. The server either
rejects or performs the request and sends a response back to the
client.
Btw. an event based system would probably violate most of the constraints. It is hard to define things like hypermedia the engine of application state without clients (since the other name of application state is client state) and hyperlinks (since they are meaningless by pub/sub), and so on...
Anyways it is an interesting question how to design an event based system somewhat similar to REST. I think you should publish self-descriptive messages containing RDF, but that's just a tip. Polling can be a viable solution, but if I were you I would not try to force REST on an event based system...
update 2016.05.15.
As far as I understand the client - server architecture - Fielding describes here and here in his dissertation - uses always REQ/REP communication. The client sends the request and the REST service responds. If you want to have something like PUB/SUB without the violation of the client - server constraint, the only way to do that is the usage of polling. If you don't want to use polling, then ofc. you can use a REST service and a websocket service together, it is not forbidden...