I am making a transition from SOAP to REST, and I would like to convince my colleagues that this is a good move. We don't need the extra security mechanisms that SOAP can provide. For us the overhead of SOAP and WSDL has only proven to be a headache over the years.
Apart from the obvious simplifications, one really valuable advantage for our system would be the HTTP caching mechanisms. I have read things on the subject, but I still don't fully understand why these caching mechanisms can't be applied to SOAP messages.
Is it simply because REST by convention encodes all the parameters in the url? Since a GET call can also have a body with parameters, I understand that it's not restricted for REST, but the caching mechanisms don't work if you do so?
SOAP, when using HTTP as the transfer mechanism, is sent via HTTP POST requests. As HTTP POST is non-idempotent, it is not cached at the HTTP level.
REST may be cached, provided the requests in question are idempotent ones: GET, PUT and (theoretically) DELETE. Any POST requests still won't be cached. That said, if you are planning to do a lot of work with this, you should look into how caches check whether they are still valid. In particular, the ETag header will be a good way to implement caching providing you've got a cheap way to compute what the value should be.
Is it simply because REST by convention encodes all the parameters in the url? Since a GET call can also have a body with parameters, I understand that it's not restricted for REST, but the caching mechanisms don't work if you do so?
REST does not prescribe any particular mechanism for how to encode request parameters in the URL. What is recommended is that clients never do any URL synthesis: let the server do all the URL creation (by whatever mechanism you want, such as embedding within the path or as query parameters). What you definitely shouldn't do is have a GET where the client sends a body to the server! Any cache is likely to lose that. Instead, when that request doesn't correspond to a simple fetch of some resource, you POST or PUT that complex document to the server and GET the result of the operation as a separate stage.
A POST of a complex document that returns the result of the operation as another complex document is pretty much how SOAP is encoded. If you're tempted to do that, you might as well use SOAP directly as that's got much more mature tooling in many languages. If you want to run REST that way, you're probably doing it wrong somewhere…
Stating that POST requests are non-idempotent in general is not correct. POST should be used for non-idempotent operations but SOAP is not really using the HTTP correctly and as the result of that it can happened, that POST is used for idempotent operation - for example for fetching an entity by ID (which you would normally use GET for).
Also according to this: Is it possible to cache POST methods in HTTP? (check the answer of reBoot), POST requests is possible to cache according to RFC. It may not be implemented very well by browsers and proxies though. Anyway if you add appropriate HTTP headers, SOAP over HTTP messages should be possible to cache using HTTP caching mechanisms.
Another reason could be that SOAP URI is always the soap server, its not determining which resource needs to be cached or what is the resource. So Cache server cannot perform caching of something it don't know. However, REST has URI for each resource (can be multiple URIs for same resource), which helps caching server to perform its job.
On top of this, as mentioned in other answers only some of HTTP verbs are used for caching like GET, PUT etc (mostly GET).
Related
What's the purpose of the HTTP-methods GET, POST, PUT, DELETE, HEAD etc.
Is it just a convention to indicate, what I'm intending to do? Or is it, that certain HTTP-features are only available if I use a specific method?
GET requests:
GET requests can be cached
GET requests remain in the browser history
GET requests can be bookmarked
GET requests should never be used when dealing with sensitive data
GET requests have length restrictions
GET requests are only used to request data (not modify)
POST requests:
POST requests are never cached
POST requests do not remain in the browser history
POST requests cannot be bookmarked
POST requests have no restrictions on data length
now I hope you understood the difference between GET and POST. In similar ways, every method has different purpose.
REST enables intermediate processing by constraining messages to be self-descriptive: interaction is stateless between requests, standard methods and media types are used to indicate semantics and exchange information, and responses explicitly indicate cacheability. -- Fielding, 2000
The important idea here is that the self descriptive messages of REST communicate semantics in such a way that general purpose components can understand them and do something useful with that information.
For instance, if we know that the semantics of a message are idempotent, we can automatically resend that message when a response is lost.
We know that if the semantics of a message are safe, then we can send that request purely speculatively. This not only allows optimizations like pre-fetching content, but it also means that we can safely crawl the web, looking for interesting documents.
Because the message semantics are visible in readily standardizable forms, we can drop in general purpose, off the shelf components, and everything just works.
SOAP is an example of message passing that did not have standardized semantics. Messages were XML documents that could only be interpreted correctly by components familiar with that specific schema (usually via the WSDL).
On the web, that meant that, regardless of meaning, every request was sent via POST. What that does, in effect, is reduce HTTP from an application protocol to a transport protocol. And a lot of the power of the web goes away when you do that.
I want to perform two POST requests to two different endpoints, is it possible to do this in a transactional way?
I want to perform two POST requests to two different endpoints, is it possible to do this in a transactional way?
No.
There's no rule that says that the effects of a POST will be scoped to precisely one resource. On the web, we can submit one HTML form that changes multiple pages. Implementations have a lot of freedom when processing a request (HTTP constrains message meaning, not handler implementations).
POST /replace/these/resources/with/empty/documents
Content-Type: text/plain
/A
/B/C
/D/E/F?g
A piece that is missing, however, is the ability to communicate to general purpose components which resources have been changed when handling the POST request. The mechanism described by RFC 7234 handles only a few simple cases, and does not (as far as I can tell) extend generally.
You get at most three
effective request uri
Location
Content-Location
and both Location and Content-Location mean something, so you may end up with a mess if you try to force them out of their normal semantics.
You can, of course, send back response that includes in the payload a list of all of the resources that were changed by your transaction (if that's practical), but you don't have any way to lift that information into the metadata where general purpose components can use it.
Aside from the obvious answer of "Best Practice" and "Creates a Standard" is there a technical argument for using GET, POST, PUT, DELETE, etc. instead of doing everything with POST?
In addition to creating a Uniform Interface,
the different verbs have different properties:
safe (GET, HEAD, OPTIONS, TRACE)
The purpose of distinguishing between safe and unsafe methods is to
allow automated retrieval processes (spiders) and cache performance
optimization (pre-fetching) to work without fear of causing harm.
idempotent (PUT, DELETE, GET, HEAD, OPTIONS, TRACE)
Idempotent methods are distinguished because the request can be repeated automatically if a communication failure occurs before the client is able to read the server's response.
cacheable: (GET, HEAD, POST)
Request methods can be defined as "cacheable" to indicate that responses to them are allowed to be stored for future reuse;
All the intermediate servers that make up "the internet" rely on the uniform interface and those properties to work correctly. That includes Content Delivery Networks, proxies, and caches. Using verbs correctly helps the internet work better.
is there a technical argument for using GET, POST, PUT, DELETE, etc. instead of doing everything with POST?
Yes.
In the case of HTTP, one of the important reasons to choose GET over POST is caching. RFC 7234 defines a bunch of caching semantics that can be written into messages (specifically into the headers).
One of the interesting problems of computer science is cache invalidation; in HTTP, there is a specific semantic for invalidation that is related to methods:
A cache MUST invalidate the effective Request URI (Section 5.5 of [RFC7230]) as well as the URI(s) in the Location and Content-Location response header fields (if present) when a non-error status code is received in response to an unsafe request method.
In other words, part of the distinction between POST and GET (also HEAD), is that generic clients know to invalidate cached representations of resources. So I can use a third party headless browser to talk to your API, and caching "just works".
The lines between POST and the other unsafe methods are less clear, but present. The basic outline is that POST can mean anything - but PUT and DELETE both specifically describe idempotent operations. So once again, I don't need to customize a client that does the right thing if messages are being lost on the unreliable network -- I can use a domain agnostic HTTP client to PUT and DELETE, and rebroadcast of lost messages can again "just work".
The power of the distinctions between methods is that we can create and re-use generic intermediate components (headless browsers, caches, reverse proxies) that know HTTP semantics, and (because we've described our domain protocols using those semantics) they all "just work" -- the generic components are able to do useful work even through they have no idea what is really going on.
Those verbs are part of the HTTP specification which is a primary reason we use them when we do REST over HTTP, and it was in the early days we started doing REST over HTTP so they became part of the best practice.
by considering the application of REST principles in the web.
i am doing a case study on REST and I have some doubt mostly on Uniform interface.
I assumes that Uniform Interface has only one single PROCESS instead of HTTP verbs (e.g. get, post, put, delete, head, ...). Is there any potential consequences of this kind of process with conventional HTTP verbs?
Is there any potential consequences of this kind of process with conventional HTTP verbs?
There are a few.
One consideration is safety. In the RFC-7231, safe is defined this way.
Request methods are considered "safe" if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource.
So if PROCESS were a safe verb, like GET, you would have an analog of the read-only web. The HTTP spec also defines HEAD and OPTIONS (which are optimized reads) and TRACE (a debugging tool); given that HTML has been an extremely successful hypermedia format without including support for these methods suggests that they aren't particularly critical of themselves.
A safe specification of PROCESS preserves all of the scaling benefits of REST. But it's utility is limited - clients can consume content, but they can't produce any.
On the other hand, if PROCESS isn't safe, then a bunch of use cases can no longer be supported. Prefetching content is no longer an option, because the components can no longer assume that invoking PROCESS has no side effects on the server. Crawling is no longer an option, for the same reason.
It's probably worth noticing the mechanics of methods in the web -- it's the hypermedia format that describes which methods are appropriate for which links. So you could potentially work around some of the issues by defining the restrictions on the method within the hypermedia format itself. It's a different way of communicating the same information to any component that can consume the media type in question.
But there are at least two additional considerations. First, that the information in the links can only be understood by components that know that media type. On the web, most components are media type agnostic -- caching html looks exactly like caching json looks like caching jpeg.
Second, that the information in the links is only available on the outbound side. REST means stateless - all of the information needed to process the request is contained within the request. That implies that the request must include within it all of the information needed to handle a communication failure.
For instance, the http spec defines idempotent.
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
This property is important for intermediary components when they forward a request along an unreliable, and receive no response from the server. We've got no way to know if a message is lost, or just slow, and we've got no way to distinguish the request message being lost from the response message being lost.
If the request includes the information that it is idempotent, then the intermediaries know that they can resend the message to the server, rather than reporting the error to the client.
Contrast this with correct handling of POST in http; since the POST request does not have an idempotent marker on it, the components do not know that a resending the message is going to have the desired effect (which is why web browsers typically display a warning if you try to POST a form twice).
Locking yourself into a single method give you a choice; do you want to support error recovery by intermediaries? or do you want the flexibility to support not-idempotent writes?
What is the value of RESTful “methods” (ie. GET, POST, PUT, DELETE, PATCH, etc.)?
Why not just make every client use the “GET” method w/ any/all relevant params, headers, requestbodies, JSON,etc. etc.?
On the server side, the response to each method is custom & independently coded!
For example, what difference does it make to issue a database query via GET instead of POST?
I understand that GET is for queries that don’t change the DB (or anything else?).
And POST is for calls that do make changes.
But, near as I can tell, the RESTful standard doesn’t prevent one to code up a server response to GET and issue a stored procedure call that indeed DOES change the DB.
Vice versa… the RESTful standard doesn’t prevent one to code up a server response to POST and issue a stored procedure call that indeed does NOT change the ANYTHING!
I’m not arguing that a midtier (HTTP) “RESTlike” layer is necessary. It clearly is.
Let's say I'm wrong (and I may be). Isn't it still likely that there are numerous REST servers violating the proper use of these protocols suffering ZERO repercussions?
The following do not directly address my questions but merely dance uncomfortably around it like an acidhead stoner at a Dead concert:
Different Models for RESTful GET and POST
RESTful - GET or POST - what to do?
GET vs POST in REST Web Service
PUT vs POST in REST
I just spent ~80 hours trying to communicate a PATCH to my REST server (older Android Java doesn't recognize the newer PATCH so I had to issue a stupid kluge HTTP-OVERIDE-METHOD in the header). A POST would have worked fine but the sysop wouldn't budge because he respects REST.
I just don’t understand why to bother with each individual method. They don't seem to have much impact on Idempotence. They seem to be mere guidelines. And if you "violate" these "guidelines" they give someone else a chance to point a feckless finger at you. But so what?
Aren't these guidelines more trouble than they're worth?
I'm just confused. Please excuse the stridency of my post.
Aren’t REST GET/POST/etc. methods superfluous?
What is the value of RESTful “methods” (ie. GET, POST, PUT, DELETE, PATCH, etc.)?
First, a clarification. Those aren't RESTful methods; those are HTTP methods. The web is a reference implementation (for the most part) of the REST architectural style.
Which means that the authoritative answers to your questions are documented in the HTTP specification.
But, near as I can tell, the RESTful standard doesn’t prevent one to code up a server response to GET and issue a stored procedure call that indeed DOES change the DB.
The HTTP specification designates certain methods as being safe. Casually, this designates that a method is read only; the client is not responsible for any side effects that may occur on the server.
The purpose of distinguishing between safe and unsafe methods is to allow automated retrieval processes (spiders) and cache performance optimization (pre-fetching) to work without fear of causing harm.
But you are right, the HTTP standard doesn't prevent you from changing your database in response to a GET request. In fact, it even calls out specifically a case where you may choose to do that:
a safe request initiated by selecting an advertisement on the Web will often have the side effect of charging an advertising account.
The HTTP specification also designates certain methods as being idempotent
Of the request methods defined by this specification, PUT, DELETE, and safe request methods are idempotent.
The motivation for having idempotent methods? Unreliable networks
Idempotent methods are distinguished because the request can be repeated automatically if a communication failure occurs before the client is able to read the server's response.
Note that the client here might not be the user agent, but an intermediary component (like a reverse proxy) participating in the conversation.
Thus, if I'm writing a user agent, or a component, that needs to talk to your server, and your server conforms to the definition of methods in the HTTP specification, then I don't need to know anything about your application protocol to know how to correctly handle lost messages when the method is GET, PUT, or DELETE.
On the other hand, POST doesn't tell me anything, and since the unacknowledged message may still be on its way to you, it is dangerous to send a duplicate copy of the message.
Isn't it still likely that there are numerous REST servers violating the proper use of these protocols suffering ZERO repercussions?
Absolutely -- remember, the reference implementation of hypermedia is HTML, and HTML doesn't include support PUT or DELETE. If you want to afford a hypermedia control that invokes an unsafe operation, while still conforming to the HTTP and HTML standards, the POST is your only option.
Aren't these guidelines more trouble than they're worth?
Not really? They offer real value in reliability, and the extra complexity they add to the mix is pretty minimal.
I just don’t understand why to bother with each individual method. They don't seem to have much impact on idempotence.
They don't impact it, they communicate it.
The server already knows which of its resources are idempotent receivers. It's the client and the intermediary components that need that information. The HTTP specification gives you the ability to communicate that information for free to any other compliant component.
Using the maximally appropriate method for each request means that you can deploy your solution into a topology of commodity components, and it just works.
Alternatively, you can give up reliable messaging. Or you can write a bunch of custom code in your components to tell them explicitly which of your endpoints are idempotent receivers.
POST vs PATCH
Same song, different verse. If a resource supports OPTIONS, GET, and PATCH, then I can discover everything I need to know to execute a partial update, and I can do so using the same commodity implementation I use everywhere else.
Achieving the same result with POST is a whole lot more work. For instance, you need some mechanism for communicating to the client that POST has partial update semantics, and what media-types are accepted when patching a specific resource.
What do I lose by making each call on the client GET and the server honoring such just by paying attention to the request and not the method?
Conforming user-agents are allowed to assume that GET is safe. If you have side effects (writes) on endpoints accessible via GET, then the agent is allowed to pre-fetch the endpoint as an optimization -- the side effects start firing even though nobody expects it.
If the endpoint isn't an idempotent receiver, then you have to consider that the GET calls can happen more than once.
Furthermore, the user agent and intermediary components are allowed to make assumptions about caching -- requests that you expect to get all the way through to the server don't, because conforming components along the way are permitted to server replies out of their own cache.
To ice the cake, you are introducing another additional risk; undefined behavior.
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
Where I believe you are coming from, though I'm not certain, is more of an RPC point of view. Client sends a message, server responds; so long as both participants in the conversation have a common understanding of the semantics of the message, does it matter if the text in the message says "GET" or "POST" or "PATCH"? Of course not.
RPC is a fantastic choice when it fits the problem you are trying to solve.
But...
RPC at web scale is hard. Can your team deliver that? can your team deliver with cost effectiveness?
On the other hand, HTTP at scale is comparatively simple; there's an enormous ecosystem of goodies, using scalable architectures, that are stable, tested, well understood, and inexpensive. The tires are well and truly kicked.
You and your team hardly have to do anything; a bit of block and tackle to comply with the HTTP standards, and from that point on you can concentrate on delivering business value while you fall into the pit of success.