From my point of view, session violates RESTfulness either when it is stored in the memory or db.
In case of session stored in the memory, it is hard to scale up the server using like load-balancing since servers don't share the session data.
Likewise in case of session stored in database, the database will be over-loaded when many servers simultaneously make queries.
My issue is related with the second case.
For a long time, I had thought that the server and database are different.
My past-assumption)
When client make request to the server with specific data, then server stores that data in the database like mysql or mongo etc.
So server don't have to care about the client's state since database has all control over them.
Server can stand alone against the client's request since server can make query to the database whenever wants to know who the client is.
So my two questions are that,
Whenever mentioning that "RESTful server stand alone against the client's request", is that 'server' includes database?
If yes, and if there is a User model and a Post-model associated with one-to-many relation, isn't that also violates RESTfulness?
I am sure that the second question makes no sense, since if the second question's answer is true, then RESTapi would have never been that useful.
But I cannot understand the difference between session in the database violates RESTfulness and the User-Post does not violate Restfulness.
I think that both of them are following the same procedure, client-server-database.
How can I understand this issue easily?
What's generally meant with statelessness is, summed up that : all the information to execute the HTTP request is self-contained within the request.
Some implications are that:
I can disconnect the TCP socket and re-open it, or I can keep a TCP connection open and it makes no difference. All the information to execute the request is contained within the request.
In the case of idempotent methods, I can re-do the exact same request and end up with the same state as if I only did it once.
In other words,
There are more complete descriptions of statelessness and HTTP, but the important thing is that statelessness here does NOT mean that the server cannot have any state at all. Most REST services are probably useless if there is no state.
Now to the question of whether having a session violates REST principals. I think it's hard to objectively state this either way. The important parts in relation to your question is that to be RESTful you need a concept of resources, a concept of being able to address them and a concept of transferring state between client and server. (there are more things that make up a REST service, but here are a few relevant bits).
I don't think having a means of authentication prevents this, whether authentication is done via an Authorization header or a Cookie header is not really that relevant.
If the session cookies and associated session data starts interfering with this process in other ways, it might be possible for a session-related feature to violate REST principals, but I don't think this is generally true.
If there are 'many articles' saying that sessions violate REST, I don't think there is any real basis for this. There is a lot of garbage descriptions of REST going around. I do think it might be bad for other reasons to use cookies for authentication. It does create potential for security issues.
Restful server is separate from the database.
Restful server, if there is such a thing, is just a web server.
REST is just an architecture, say a methodology, that delivers information from server to client and vice versa over HTTP.
Related
According to this in property 3 and property 4,
Stateless
Roy fielding got inspiration from HTTP, so it reflects in this
constraint. Make all client-server interactions stateless. The server
will not store anything about the latest HTTP request the client made.
It will treat every request as new. No session, no history.
No client context shall be stored on the server between requests. The
client is responsible for managing the state of the application.
But then again,
In REST, caching shall be applied to resources when applicable, and
then these resources MUST declare themselves cacheable. Caching can be
implemented on the server or client-side
How is the server being stateless if it can cache information?
tldr: Stateless refers the behavior of a server to not record any information on the client's behalf between calls.
Caches are used as a server optimization strategy for resources that are requested often (and do not change frequently).
If a server is "stateless" this mean that no information will be held on the server side on the clients behalf between requests. Thus each request that a client makes must contain all of the required information for the server to perform the desired action. Irrespective of how many calls the client has made on this server previously.
Stateless means there is no memory of the past. Every request is performed as if it were being done for the very first time.
Stateful means that there is memory of the past. Previous request are remembered and may impact behavior of the current Request
Caching is merely holding a copy of a resource that the server is responsible of serving. Caching is commonly used for highly requested resources. Caching strategies can be used by both stateless and stateful services.
In REST when designing an api, you can have Stateless iteration with your clients and you can use caching to store highly requested items in memory to save IO calls to disk.
The authoritative reference for REST is Fielding's dissertation.
Fielding's definition of stateless is found in the discussion of network-based architectural styles
The client-stateless-server style derives from client-server with the additional constraint that no session state is allowed on the server component. Each request from client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the server. Session state is kept entirely on the client.
It may help to contrast this idea with FTP, where the server is expected to track session state.
RETR example.txt
In FTP, to interpret that RETR command correctly, you need to know what the current working directory is for the session, and the clues that tell you that are stored in earlier messages.
Because HTTP requests are self contained, you don't need "sticky" session management.
As Fielding himself notes, Cookies violate the stateless rest constraint:
The same functionality should have been accomplished via anonymous authentication and true client-side state.
Someone asked me this question a couple of days ago and I didn't have an answer:
As HTTP is a stateless protocol. When we open www.google.com, can it
be called a REST call?
What I think:
When we make a search on google.com all info is passed through cookie and URL parameters. It looks like a stateless request.
But the search results are not independent of user's past request. Search results are specific to user interest and behavior. Now, it doesn't look like a stateless request.
I know this is an old question and I have read many SO answers like Why HTTP is a stateless protocol? but I am still not able to understand what happens when user activity is tracked like on google or Amazon(recommendations based on past purchases) or any other user activity based recommendation websites.
Is it RESTful or is it RESTless?
What if I want to create a web app in which I use REST architecture and still provide user-specific responses?
HTTP is stateless, however the Google Application Layer is not. The specific Cookies and their meaning is part of the Application Layer.
Consider the same with TCP/IP. IP is a stateless protocol, but TCP isn't. The existence of state in TCP embedded in IP packets does not mean that IP protocol itself has a state.
So does that make it a REST call? No.
Although HTTP is stateless & I would suspect that www.google.com when requested with cookies disabled, the results would be the same for each request, making it almost stateless (Google still probably tracks IP to limit request frequency).
But the Application Layer is not stateless. One of the principles of REST is that the system does not retain state data about about the client between requests for the purpose of modifying the responses. In the case of Google, that clearly is not happening.
It seems that the meaning of "stateless" is being (hypothetically) taken beyond its practical expression.
Consider a web system with no DB at all. You call a (RESTful) API, you always get the exactly the same results. This is perfectly stateless... But this is perfectly not a real system, either.
A real system, in practically every implementation, holds data. Moreover, that data is the "resources" that RESTful API allows us to access. Of course, data changes, due to API calls as well. So, if you get a resource's value, change its value, and then get its value again, you will get a different value than the first read; however, this clearly does not say that the reads themselves were not stateless. They are stateless in the sense that they represent the very same action (or, more exact, resource) for each call. Change has to be manually done, using another RESTful API, to change the resource value, that will then be reflected in the next call.
However, what will be the case if we have a resource that changes without a manual, standard API verb?
For example, suppose that we have a resource that counts the number of times some other resource was accessed. Or some other resource that is being populated from some other third party data. Clearly, this is still a stateless protocol.
Moreover, in some sense, almost any system -- say, any system that includes an authentication mechanism -- responds differently for the same API calls, depending, for example, on the user's privileges. And yet, clearly, RESTful systems are not forbidden to authenticate their users...
In short, stateless systems are stateless for the sake of that protocol. If Google tracks the calls so that if I call the same resource in the same session I will get different answers, then it breaks the stateless requirement. But so long as the returned response is different due to application level data, and are not session related, this requirement is not broken.
AFAIK, what Google does is not necessarily related to sessions. If the same user will run the same search under completely identical conditions (e.g., IP, geographical location, OS, browser, etc.), they will get the very same response. If a new identical search will produce different results due to what Google have "learnt" in the last call, it is still stateless, because -- again -- that second call would have produced the very same result if it was done in another session but under identical conditions.
You should probably start from Fielding's comments on cookies in his thesis, and then review Fielding's further thoughts, published on rest-discuss.
My interpretation of Fielding's thoughts, applied to this question: no, it's not REST. The search results change depending on the state of the cookie header in the request, which is to say that the representation of the resource changes depending on the cookie, which is to say that part of the resource's identifier is captured in the cookie header.
Most of the problems with cookies are due to breaking visibility,
which impacts caching and the hypertext application engine -- Fielding, 2003
As it happens, caching doesn't seem to be a big priority for Google; the representation returned to be included a cache control private header, which restricts the participation by intermediate components.
I have an app that connects to different databases on a mongodb instance. The different databases are for different clients. I want to know if my clients data will be compromised if I used a single user to login to the different databases. Also, is it a must for this user to be root? to readWrite role will do the trick. I'll be co connecting to the databases through a java backend.
There is no straightforward answer to this. It's about risk and cost-benefit.
If you use the same database user to connect to any database, then client data separation depends much more on business logic in your application. If any part of your code can just decide to connect to any client database, then a request from one client may (and according to my experience, eventually will) end up in a different client's database. Some factors make this more likely to happen, like for example if many people develop your app for a longer time, somebody will make a mistake.
A more secure option would be to have a central piece or component that is very rarely changed with changes strictly monitored, which for each client session (or even request) would take the credentials accroding to the client and use that to connect to the database. This way, any future mistake by a developer would be limited in scope, they would not be able to use the wrong database for example. And then we haven't mentioned non-deliberate application flaws, which would allow an attacker to do the same, and which are much more likely. If you have strong enforcement and separation in place, an malicious user from one client may not be able to access other clients data even in case of some application vulnerabilities, because the connection would be limited to the right database. (Note that even in this case, your application needs to have access to all client database credentials, so a full breach of your application or server would still mean all client data lost to the attacker. But not every successful attack ends in total compromise.)
Whether you do this or not should depend on risks. One question for you to answer is how much it would cost you if a cross-client data breach happened. If it's not a big deal, probably separation in business logic is ok. If it means going out of business, it is definitely not enough.
As for the user used for the connection should be root - no, definitely not. Following the principle of least privilege, you should use a user that only has rights to the things it needs to, ie. connecting to that database and nothing else.
Trying to understand REST HATEOAS:
Suppose I have a service that has state; they are: initial, ready, running. I have a client that connects to the service, obtains a page with links that allow it to mutate the service state.
It uses one of the links to change the service's state and obtains another page with new links.
As long as there is 1 client, the state the client holds is identical with the service. But if there is a second client and it changes the service's state, the first client's representation is stale.
How is this resolved in HATEOAS? From what I've read it seems that REST is not applicable and I should maybe look at something else. If so, what?
Thanks!
This is not resolved by HATEOS (entirely). As REST is stateless this is kind of a paradox use case to keep state in client and server aligned.
Assuming I understand your requirements, yes, you're state in client 1 is stale and not the same as the one on the server. But what if the client would make a periodic call to the server to see whether some other client changed it? If so, with HATEOAS you could provide a link to serve the current state and omit the link to change the state.
#Kay - Thanks for answering.
I'm going to try to answer my own question. I realized after reading your answer that the "application" in HATEOAS is really the virtual application the client experiences when it retrieves and processes the resources it gets from the server. Its states are the pages (resources) it transitions between. The server (service?) may have its own state but it's not the same as the client's.
As long as this distinction is kept in mind, it is not unRESTful to have stale links in client 1. The server simply responds with new links reflecting its own state. And the client makes new transitions based on the updated links.
Still trying to understand. If I have it wrong, I'd appreciate some help.
Thanks!
The stateless requirement of REST refers to the ability of the server to understand and process the client request independent from any previous interactions it has made with said client. In other words, the client should be able to send a request "out of the blue" to the server (I.e., without a session saved on said server) and have the request processed. Hence there isn't a concept of login and logout in a purely RESTful architecture.
That's a different constraint than HATEOAS. Basically, "hypermedia as the engine of application state" means that all state is conveyed through the media type being used and not the connection itself. The client can (and often does) keep its own state, and can request snapshots of the state of resources from the server through resource representations (a.k.a media types).
If you want to be notified when a resource changes state, REST is (probably) the wrong choice. You'd likely want to use a different application protocol than, say, HTTP.
As Fielding says: it's not REST without HATEOAS. Don't call your service REST if it's ignore HATEAOS and stateful service can not be REST. You understood HATEOA. The server provides hyperlinks for the client which should be use to change the state located at the CLIENT SIDE.
To solve your problem: omit tend server from any state information. It will easier your life. Then implement REST as using the Richardson Maturity Model while consider information's from here.
Assume the following scenario A web application serves up resources through a RESTful API. A number of clients consume this API. The goal is to keep the data on the clients synchronized with the web application (in both directions).
The easiest way to do this is to ask the API if any of the resources have changed since the client last synchronized with the API. This means that the client needs to ask the API for the appropriate resources accompanied by timestamp (to see if the data needs to be updated). This seems to me like the approach with the least overhead in terms of needless consumption of bandwidth.
However, I have the feeling that this approach has a few downsides in terms of design and responsibilities. For example, the API shouldn't have to deal with checking whether the resources are out of date. It seems that the only responsibility of the API should be to serve up the resources when asked without having to deal with the updating aspect. By following this second approach, the client would ask for a lot of data every time it wants to update its data to keep it synchronized with the web application. In other words, the client would check whether the data it got back is newer than the locally stored data. If this process takes place every few minutes, this might become a significant burden for the system.
Am I seeing this correctly or is there a middle road that I am overlooking?
This is a pretty common problem, and a RESTful approach can help you solve it. HTTP (the application protocol typically used to build RESTful services) supports a variety of techniques that can be used to keep API clients in sync with the data on the server side.
If the client receives a Last-Modified or E-Tag header in a HTTP response, it may use that information to make conditional GET calls in the future. This allows the server to quickly indicate with a 304 – Not Modified response that the client’s previously stored representation of the resource is still valid and accurate. This will allow the server (or even better, an intermediate proxy or cache server) to be as efficient as possible in how it responds to the client’s requests, potentially reducing costly round-trips to a back-end data store.
If a response contains a Last-Modified header and the client wishes to take advantage of the performance optimization available with it, they must include an If-Modified-Since directive in a subsequent GET call to the same URI, passing in the same timestamp value it received. This instructs the server to only GET the information from the authoritative back-end source if it knows it has changed since that time. Your server will have to be built to support this technique, of course.
A similar principle applies to E-Tag headers. An E-Tag is a simple hash code representing a specific state of the resource at a particular point in time. If the resource changes in any way, so does its E-Tag value. If the client sees an E-Tag in a response it should pass it in subsequent GET requests to the same URI, thereby allowing the server to quickly determine if the client has an up-to-date representation of the resource.
Finally, you should probably look at the long polling technique to reduce the number of repeated GET requests issued by your clients to the server. In essence, the trick is to issue very long GET requests to the server to watch for server data changes. The GET will not return a response until either the data has changed or the very long timeout fires. If the latter, the client just re-issues the same long-lived request to watch for changes again. See also topics like Comet and Web Sockets which are similar in approach.