I am a bit confused by the terminology of REST APIs being stateless. For example, if we had a To-Do list API, and one of the endpoints was used to update or delete entries, then each request does not happen in isolation.
If I create an entry before someone else queries the total entries, then their response will depend on my response.
But, PUT is seen as a verb of REST APIs. Can someone help me clear my confusion?
Stateless means that you store the client state on the client and send it with each request instead of storing it on the server. The latter is the classical server side sessions, where you have a session cookie with the session id and the server stores the session data in the database or file system. This does not scale well for Facebook size applications, that's why they rather send the session data with each request. You can ensure that the session data is not modified by the client if you sign it with a private key stored on the server. So there is signature verification by each request, but still it is less expensive than maintaining session data for more than 1M users in a database and syncing it around the globe with multiple servers to solve the single point of failure problem too. They rather send the session data with each request and if it passes the verification, then the request is handled by any node chosen by the load balancer without touching the database to get session data.
As of the part of the question related to concurrent calls, it can be solved with resource versioning. You can send the actual ETag of the resource and use the if-match header with your PUT request so the server will be able to figure out which version you request is based on. If there is a newer version, then the ETag won't match and the server will reject the request. There can be other ways to solve concurrency, it always depends on your application how you handle it.
Related
An auth system I work on has this new function:
1. Auth system allows users to specify Relying Parties they transact with,
2. The Relying Party can approve/deny/maybe the request (authorisation) - maybe causes a redirect to the RP website for further authorisation questions by the RP.
The RP has to implement a web service specified by the Auth System to perform the approve/deny/maybe request that the auth system generates.
My problem is what this looks like as a REST service. As the auth system can't really dictate the URI style for the RP system, i would like to specifying that the path does not have any parameters in it, auth system just needs to know the URI of the service. The data of the request (user name/id) might be in a bit of json in the request body (suggesting POST http verb. GET might be OK, but loath to expose user ids in the URI). The auth system does not care what the RP does with the request data, the auth system just wants a "yes/no/maybe" reply (so may not really be a GET/POST/PATCH/DELETE/etc paradigm).
What would be the best verb to use? and how to facilitate the reply; its not really a success/failure response as there are 3 possible results to the query, is it acceptable to have some json returned with the response (then what http verb to use)?
I'm a bit baffled by this. GET seems the most obvious
GET /api/user_link_authorize/{userid}
except then i'm forced to put user ids in the URI (which I dont want to do)...
Any suggestions?
My problem is what this looks like as a REST service.
Think about how it would look as a web site.
You would start with some known URI in your list of bookmarks. Fetching that page would give you a representation of a form, which would have input controls that describe what data needs to be provided (and possibly includes default values). The client provides the data it knows about, and submits the form. The data in the form is used to create a HTTP request as described by HTML's form processing rules. The response to that request includes a representation of the answer, or possibly the next bit of work to be done.
That's REST.
Retrieving the form (via the bookmarked URI) would be a GET of course; we're just updating our locally cached copy of the forms "current" representation. Submitting the form could be a GET or a POST; we don't necessarily need to know that in advance, because that information is carried in the representation of the form itself.
GET vs POST involves a number of trade offs. Semantically, GET is safe, it implies that the resource can be fetched at any time, that spiders can crawl it, that accessing the resource in that way is "free". Which is great when the resource is free, because clients on an unreliable network can automatically retry the request if the response is lost. On the other hand, announcing to the world that the request is safe when it is actually expensive to produce responses is not a winning play.
Furthermore, GET doesn't support a message body (more precisely, the payload has no defined semantics). That means that information provided by the client needs to be part of the target resource identifier itself. If you are dealing with sensitive information, that can be problematic -- not necessarily in transit (you can use a secured socket), but certainly in making sure that the URI with sensitive information is not logged where the sensitive data can leak.
POST supports including a payload with the request, but it doesn't promise that the query is safe, which means that generic components won't know if they can automatically retry the request when a response is lost.
Given that you don't want the user id in the URI, that's a point against GET, and therefore in favor of POST.
How is possible to handle timeouts in time consuming operations in a REST API. Let's say we have the following scenario as example:
A client service sends a request to insert a resource through a REST API.
Timeout elapses. The client thinks the insertion failed.
REST API keep working and finishes the insertion.
Client do not notify the resource insertion and it status is "Failed".
I can think I a solution with a message broker to send orders to a queue and wait until they are solved.
Any other workaround?
EDIT 1:
POST-PUT Pattern as has been suggested in this thread.
A Message Broker (add more complexity to the system)
Callback or webhook. Pass in the request a return url that the server API can call to let the client know that the work is completed.
HTTP offers a set of properties for invoking certain methods. These are primarily safetiness, idempotency and cacheability. While the first one guarantees a client that no data is modified, the 2nd one gives a promise whether a request can be reissued in regards to connection issues and the client not knowing whether the initial request succeeded or not and only the response got lost mid way. PUT i.e. does provide such a property, i.e.
A simple POST request to "insert" some data does not have any of these properties. A server receiving a POST request furthermore processes the payload according to its own semantics. The client does not know beforehand whether a resource will be created or if the server just ignores the request. In case the server created a resource the server will inform the client via the Location HTTP response header pointing to the actual location the client can retrieve information from.
PUT is usually used only to "update" a resource, though according to the spec it can also be used in order to create a new resource if it does not yet exist. As with POST on a successful resource creation the PUT response should include such a Location HTTP response header to inform the client that a resource was created.
The POST-PUT-Creation pattern separates the creation of the URI from the actual persistence of the representation by first firing off POST requests to the server until a response is received containing a Location HTTP response header. This header is used in a PUT request to actually send the payload to the server. As PUT is idempotent the server simply can reissue the request until it receives a valid response from the server.
On sending the initial POST request to the server, a client can't be sure whether the request reached the server and only the response got lost, or the initial request didn't make it to the server. As the request is only used to create a new URI (without any content yet) the client may simply reissue the request and in worst case just create a new URI that points to nothing. The server may have a cleanup routine that frees unused URIs after a certain amount of time.
Once the client receives the URI, it simply can use PUT to reliably send data to the server. As long as the client didn't receive a valid response, it can just reissue the request over and over until it receives a response.
I therefore do not see the need to use a message-oriented middleware (MOM) using brokers and queues in order to guarantee reliable messaging.
You could also cache the data after a successful insertion with a previously exchanged request_id or something of that sort. But I believe message broker with some asynchronous task runner is a much better way to deal with the problem especially if your request thread is a scarce resource. What I mean by that is. If you are receiving a good amount of requests all the time. Then it is a good idea to keep your responses as quickly as possible so the workers will be available for any requests to come.
I am attempting to make a website's back-end API (I want to make the back-end independent of the front-end so I'm only making a server-side API for now, abiding to RESTfulness as much as possible). I haven't done this before so I'm unaware of the 'best' & most secure way to do things.
How I do it now:
Some parts of the API should only be accessible to a specific user after they login and up to 24 hours later.
To do this, I am generating a random Session ID whenever a user logs in (I'm using passwordless logins so the user is assigned that ID when they click on a link in their email) on the server side, which respond by sending that session ID to the client once. The client then stores this session ID in localstorage (or a file in disk if the client is not a web browser).
Next, I store that ID along with the associated email in my DB (MySQL table) on the server side.
Now every time the client want something from my API, they have to provide the email & session ID in the URL (I don't want cookies for now), which the server checks against the ones in the DB, if they exist then the server responds fully else responds with an error.
After 24 hours, the server deletes the email/session ID pair and the user has to login again (to generate another session ID and associate it with their email).
Now the questions:
Is my method secure or does it have obvious vulnerabilities? Is
there another battle-tested way I'm not aware of?
Is there a better way for the client to store the session ID (if
they are a web browser)?
What is the best way to generate a unique session ID? Currently I
generate a random 16-char string that I set as the primary key of
the session-email table.
Is using a MySQL table the most performant/best way to store session
IDs (given it will be queried with each request)?
Do I need to encrypt session IDs in any way? Is it secure for the
client to send it as a 'naked' URL param?
Sorry for having too many questions in one post but I think they're related by the single scenario above. If it makes any difference, I'm using F# and I expect my client to either be an android app or a web app.
Your REST API MUST not know anything about the REST client session, not even the session id. If you don't want to send a password by every request, all you can do is signing the user id, and the timeout, so the service can authenticate based on the signature. Use JSON web token: https://en.wikipedia.org/wiki/JSON_Web_Token
You can have a server side REST client, which can have the session your described. The question is, does it really worth the effort to develop a REST service instead of a regular web application? I am not sure in your case, but typically the answer is no, because you won't have any 3rd party REST client and your application does not have enough traffic to justify the layered architecture or it is not big enough to split into multiple processes, etc...
If security is important then you MUST use a true random generator algorithm or hardware. https://en.wikipedia.org/wiki/Random_number_generation#.22True.22_vs._pseudo-random_numbers It is not safe to send anything through HTTP, you must use HTTPS instead. You MUST use the standard Authorization header instead of a query param. https://en.wikipedia.org/wiki/Basic_access_authentication
For browser based request with sticky session true load balancer can restrict request to same JVM out of multiple JVMs in a cluster.
But in case request is coming from REST client rather any browser, how the load balancer can restrict requests to same JVM even sticky session is set as true? Any Idea please.
REST client is made to call REST API and REST APIs should be stateless i.e. complete information about processing of request should be present in request itself, thus request should not dependent on any session data.
If your API is dependent on session data then in actual it is not following some principles of REST.
If your requirement is such that you need to maintain the state then it should be maintained on client side not on server. So one of the way that i will suggest is that you can use cookies to store your state and temp data. While making any REST api call just attach that cookie with request.
You can make cookie configurable so that it will be controlled by server and no one else can make change in it.
The load balancer uses Cookies to keep track of sessions. Retaining the cookies and sending them back in the client should be enough to get the expected result.
For instance, in python, that would mean replacing requests.get(url) with:
s = requests.session()
// ...
s.get(url)
I've been searching for best practices for preventing the accidental creation of duplicate resources when using POST to create a new resource, for the case where the resource is to be named by the server and hence PUT can't be used. The API I'm building will be used by mobile clients, and the situation I'm concerned about is when the client gets disconnected after submitting the POST request but before getting the response. I found this question, but there was no mention of using a conditional POST, hence my question.
Is doing a conditional POST to the parent resource, analogous to using a conditional PUT to modify a resource, a reasonable solution to this problem? If not, why not?
The client/server interaction would be just like with a conditional PUT:
Client GETs the parent resource, including the ETag reflecting its current state (which would include its subordinate resources),
Client does a conditional POST to the parent resource (includes the parent's ETag value in an If-Match header) to create a new resource,
Client gets disconnected before getting the server response, so doesn't know if it succeeded,
Later, when reconnected, the client resubmits the same conditional POST request,
Either the earlier request didn't reach the server, so the server creates the resource and replies with a 201, or the earlier request did reach the server, so the server replies with a 412 and the duplicate resource isn't created.
Your solution is clever, but less than ideal. Your client may never get his 201 confirmation, and will have to interpret the 412 error as success.
REST afficianados often suggest you create the resource with an empty POST, then, once the client has the id of the newly created resource, he can do an "idempotent" update to fill it. This is nice, but you will likely need to make DB columns nullable that wouldn't otherwise be, and your updates are only idempotent if no-one else is trying to update at the same time.
According to ME, HTTP is flaky. Requests timeout, browser windows get closed, connections get reset, trains go into tunnels with mobile users aboard. There's a simple, robust pattern for dealing with this. Unsafe actions should always be uniquely identified, and servers should store, and be able to repeat if necessary, the response to any unsafe request. This is not HTTP caching, where a request may be served from cache but the cache may be flushed for whatever reason. This is a guarantee by the server application that if an "action" request is seen a second time, the stored response will be repeated without anything else happening. If the action identity is to be generated by the server, then a request-response should be dedicated just to sending the id. If you implement this for one unsafe request, you might as well do it for all of them, and in so doing you will escape numerous thorny problems: successive update requests wiping out other users' changes, or hitting incompatible states ("order already submitted"), successive delete requests generating 404 errors.
I have a little google doc exploring the pattern more fully if you're interested.
I think this scheme would work. If you want to ensure POST does not result in duplicates, you need the client to send something unique in the POST. The server can then verify uniqueness.
You might as well have the client generate a GUID for each request, rather than obtaining this from the server via a GET.
Your steps then become:-
Client generates a GUID
Client does a POST to the resource, which includes the GUID
Client gets disconnected and doesn't know if it succeeded
Client connects again and does another POST with the same GUID
Server checks the GUID, and either creates the resource (if it never received the first POST) or indicates that this was a duplicate
It might be more restful to use PUT, and have the client decide the resource name. If you did not like the choosen name, you could indicate that you had created the resource but that it's canonical location was somewhere of the server's choosing.
Why not simply do duplicate detection on the server based on the actual resource, using whatever internal mechanism the server chooses to use.
It's just safer that way.
Then you return the URL to the appropriate resource (whether it was freshly created or not).
If the parents ETag is based on the state of sub resources, then it's not a reliable mechanism to check for "duplicate resources". All you know is that the parent has "changed", somehow, since last time. How do you even know it's because your old POST was processed after disconnect? Could be anything changed that ETag.
This is basically a optimistic locking scenario being played out, and it comes down to another question. If the resource is already created, what then? Is that an error? Or a feature? Do you care? Is it bad to send a creation request that's silently ignored by the server when the resource already exists?
And if it already exists, but is "different" enough (i.e. say the name matches but the address is different), is that a duplicate? is that an update? is that a error for trying to change an existing resource?
Another solution is to make two trips. One to stage the request, another to commit it. You can query the status of the request when you come back if it's interrupted. If the commit didn't got through, you can commit it again. If it did, you're happy and can move on.
Just depends on how unstable your comms are and how important this particular operation is whether you want to jump through the hoops to do it safely.