Should a Netflix or Twitter-style web service use REST or SOAP? [closed] - rest

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I've implemented two REST services: Twitter and Netflix. Both times, I struggled to find the use and logic involved in the decision to expose these services as REST instead of SOAP. I hope somebody can clue me in to what I'm missing and explain why REST was used as the service implementation for services such as these.
Implementing a REST service takes infinitely longer than implementing a SOAP service. Tools exist for all modern languages/frameworks/platforms to read in a WSDL and output proxy classes and clients. Implementing a REST service is done by hand and - get this - by reading documentation. Furthermore, while implementing these two services, you have to make "guesses" as to what will come back across the pipe as there is no real schema or reference document.
Why write a REST service that returns XML anyway? The only difference is that with REST you don't know the types each element/attribute represents - you are on your own to implement it and hope that one day a string doesn't come across in a field you thought was always an int. SOAP defines the data structure using the WSDL so this is a no-brainer.
I've heard the complaint that with SOAP you have the "overhead" of the SOAP Envelope. In this day and age, do we really need to worry about a handful of bytes?
I've heard the argument that with REST you can just pop the URL into the browser and see the data. Sure, if your REST service is using simple or no authentication. The Netflix service, for instance, uses OAuth which requires you to sign things and encode things before you can even submit your request.
Why do we need a "readable" URL for each resource? If we were using a tool to implement the service, do we really care about the actual URL?

A canary in a coal mine.
I have been waiting for a question like this for close to a year now. It was inevitable that this day would come and I am sure we are going to see many more questions like this in the coming months.
The warning signs
You are absolutely correct, it does take longer to build RESTful clients than SOAP clients. The SOAP toolkits take away lots of boilerplate code and make client proxy objects available with almost no effort. With a tool like Visual Studio and a server URL I can be accessing remote objects of arbitrary complexity, locally in under five minutes.
Services that return application/xml and application/json are so annoying for client developers. What are we supposed to do with that blob of data?
Fortunately, lots of sites that provide REST services also provide a bunch of client libraries so that we can use those libraries to get access to a bunch of strongly typed objects. Seems kind of dumb though. If they had used SOAP we could have code-gen’d those proxy classes ourselves.
SOAP overhead, ha. It’s latency that kills. If people are really concerned about the number of excess bytes going across the wire then maybe HTTP is not the right choice. Have you seen how many bytes are used by the user-agent header?
Yeah, have you ever tried using a web browser as debugging tool for anything other than HTML and javascript. Trust me it sucks. You can only use two of the verbs, the caching is constantly getting in the way, the error handling swallows so much information, it’s constantly looking for a goddamn favicon.ico. Just shoot me.
Readable URL. Only nouns, no verbs. Yeah, that’s easy as long as we are only doing CRUD operations and we only need to access a hierarchy of objects in one way. Unfortunately most applications need a wee bit more functionality than that.
The impending disaster
There are a metric boatload of developers currently developing applications that integrate with REST services who are in the process of coming to the same set of conclusions that you have. They were promised simplicity, flexibility, scalability, evolvabilty and the holy grail of serendipitous reuse. The characteristics of the web itself, how can things go wrong.
However, they are finding that versioning is just as much of a problem, but the compiler doesn’t help detect issues. The hand written client code is a pain to maintain as the data structures evolve and URLs get refactored. Designing APIs around just nouns and four verbs can be really hard, especially with RESTful Url zealots telling you when you can and cannot use query strings.
Developers are going to start asking why are we wasting our effort on support both Json formats and Xml formats, why not just focus our efforts on one and do it well?
How did things go so wrong
I’ll tell you what went wrong. We as developers let the marketing departments take advantage of our primary weakness. Our eternal search for the silver bullet blinded us to the reality of what REST really is. On the surface REST seems so easy and simple. Name your resources with Urls and use GET, PUT, POST and DELETE. Hell, us devs already know how to do that, we have been dealing with databases for years that have tables and columns and SQL statements that have SELECT, INSERT, UPDATE and DELETE. It should have been a piece of cake.
There are other parts of REST that some people discuss, such as self-descriptiveness, and the hypermedia constraint, but these constraints are not so simple as resource identification and the uniform interface. The seem to add complexity where the desired goal is simplicity.
This watered down version of REST became validated in developer culture in many ways. Server frameworks were created that encouraged Resource Identification and the uniform interface, but did nothing to support the other constraints. Terms started to float around differentiating the approaches, (HI-REST vs LO-REST, Corporate REST vs Academic REST, REST vs RESTful).
A few people scream out that if you don’t apply all of the constraints it’s not REST. You will not get the benefits. There is no half REST. But those voices were labelled as religious zealots who were upset that their precious term had been stolen from obscurity and made mainstream. Jealous people who try to make REST sound more difficult than it is.
REST, the term, has definitely become mainstream. Almost every major web property that has an API supports "REST". Twitter and Netflix are two very high profile ones. The scary thing is that I can only think of one public API that is self-descriptive and there are a handful that truly implement the hypermedia constraint. Sure some sites like StackOverflow and Gowalla support links in their responses, but there are huge gaping holes in their links. The StackOverflow API has no root page. Imagine how successful the web site would have been if there was no home page for the web site!
You were misled I’m afraid
If you have made it this far, the short answer to your question is those APIs (Netflix and Twitter) do not conform to all of the constraints and therefore you will not get the benefits that REST apis are supposed to bring.
REST clients do take longer to build than SOAP clients but they are not tied to one specific service, so you should be able to re-use them across services. Take the classic example, of a web browser. How many services can a web browser access? What about a Feed Reader? Now how many different services can the average Twitter client access? Yes, just one.
REST clients are not supposed to be built to interface with a single service, they are supposed to be built to handle specific media types that could be served by any service. The obvious question to that is, how can you build a REST client for a service that delivers application/json or application/xml. Well you can’t. That’s because those formats are completely useless to a REST client. You said it yourself,
you have to make "guesses" as to what
will come back across the pipe as
there is no real schema or reference
document
You are absolutely correct for services like Twitter. However, the self-descriptive constraint in REST says that the HTTP content type header should describe exactly the content that is being transmitted across the wire. Delivering application/json and application/xml tells you nothing about the content.
When it comes to considering the performance of REST based systems it is necessary look at the bigger picture. Talking about envelope bytes is like talking about loop unwinding when comparing a quick-sort to a shell-sort. There are scenarios where SOAP can perform better, and there are scenarios where REST can perform better. Context is everything.
REST gains much of its performance advantage by being very flexible about what media types it supports and by having sophisticated support for caching. For caching to work well though nearly all of the constraints must be adhered to.
Your last point about readable urls is by far the most ironic. If you truly commit to the hypermedia constraint, then every URL could be a GUID and the client developer would lose nothing in readability.
The fact that URIs should be opaque to the client is one of the most key things when developing REST systems. Readable URLs are convenient for the server developer and well structured URLs make it easier for the server framework to dispatch requests, but those are implementation details that should have no impact on the developers consuming the API.
The Twitter API is not even close to being RESTful and that is why you are unable to see any benefit to using it over SOAP. The Netflix API is much closer but it’s use of generic media types demonstrates that failing to adhere to even a single constraint can have a profound impact on the benefits derived from the service.
It may not be all their fault
I’ve done a whole lot of dumping on the service providers, but it takes two to dance RESTfully. A service may follow all of the constraints religiously and a client can still easily undo all of the benefits.
If a client hard codes urls to access certain types of resources then it is preventing the server from changing those urls. Any kind URL construction based on implicit knowledge of how the service structures its urls is a violation.
Making assumptions about what type of representation will be returned from a link can lead to problems. Making assumptions about the content of the representation based on knowledge that is not explicitly stated in the HTTP headers is definitely going to create coupling that will cause pain in the future.
Should they have used SOAP?
Personally, I don’t think so. REST done right allows a distributed system to evolve over the long term. If you are building distributed systems that have components that are developed by different people and need to last for many years, then REST is a pretty good option.

SOAP is an object-oriented, remote procedure call technology stack. It works by building a new abstraction on top of an existing protocol (HTTP).
REST is a document oriented approach, that simply uses the features of an existing protocol (HTTP). "REST" is just a buzzword -- the concept is this: Just use the web the way it was designed to work!
In response to edits to question:
"Implementing a REST service takes infinitely longer than implementing a SOAP service."
Um, no, it can't be infinitely longer. And in cases where what you are trying to retrieve is already a document or file, it's actually much faster. For example, the OGC spec for WMS (Web Mapping Service) defines both a SOAP and REST version of the protocol, and there's a reason why almost nobody implements the SOAP version -- it's because if you're trying to get a map, it's a lot easier to just build a URL and fetch image bytes from that URL than it is to bother with encapsulating it into a SOAP message. But yes, I will agree that if the point of the web service is to transfer some strongly-typed object in a domain object model, SOAP is better suited for that use.
"Why write a REST service that returns XML anyway?"
Well, yes, that can be silly. But it depends on what the XML is. If there's a clearly defined schema for it somewhere, then there's no ambiguity. For example, you can think of WSDL URLs as being a kind of RESTful web service for retrieving information about a web service. In this case, adding the overhead of another SOAP request would be pointless.
In general, REST wins when the content that is being transferred can be thought of as a file, as a single unit. SOAP wins when the content needs to be treated as an object with members.
"I've heard the complaint that with SOAP you have the "overhead" of the SOAP Envelope. In this day and age, do we really need to worry about a handful of bytes?"
Yes. Not in every circumstance, but there are sites with a great deal of traffic where it makes a difference. Is it enough of a difference to outweigh the semantic differences of using SOAP instead of REST? I doubt it. If you're doing an object remoting protocol and the number of bytes is making a difference, SOAP is probably not the tool for you anyway -- maybe you should be using CORBA or DCOM instead.
"I've heard the argument that with REST you can just pop the URL into the browser and see the data."
Yes, and this is a large argument in favor of REST if it makes sense to view the data in a browser. For example, with image data, it's an easy way to debug the service -- just paste the URL into your browser's address bar and see what the image looks like. Or if the data returned is in XML, and you have a referenced XML stylesheet that renders into readable HTML in the browser, then you get the benefit of semantic markup and easy visualization all in one package. But you are correct, this benefit mostly evaporates when working with more complex authentication schemes. If you can't encode all your authentication information into each HTTP request, then I would argue that it doesn't count as REST at all.
"Why do we need a "readable" URL for each resource? If we were using a tool to implement the service, do we really care about the actual URL?"
Well, it depends. Why do we need readable URLs for any resource on the web? You can read Tim Berners-Lee's essay Cool URIs Don't Change for the rationale, but basically, as long as the resource may still be useful in the future, the URI for that resource should stay the same.
Obviously, for transient resources (like the "today's Money" link in the essay) there is no need for it, since the need to reference the resource goes away if the corresponding resource goes away. But for more permanent resources (like StackOverflow questions, for example, or movies on IMDB), you want to have a URL that will work forever. When you're designing a web service, you need to decide if the resources themselves could outlive your service, and if so, then REST is probably the right way to go.
For the record, yes, I've been developing web pages since well before NetFlix or Twitter existed. And no, I've not yet had any need or opportunity to implement a client to either NetFlix or Twitter's services. But even if their services are atrociously difficult to work with, that doesn't mean the technology they implemented their services on top of is bad -- only that those two implementations are bad.
To make a long story short: REST and SOAP are just tools. They each have strengths and weaknesses. If the only tool you have is a hammer, then every problem looks like a nail. So get to know both tools, and learn how to use them correctly, and then choose the right tool for each job.

An honest question deserves an honest answer. But first, why did you use the text of this question as an answer to another question if you did not think it was rhetorical in nature?
Anyway:
"Tools exist for all modern languages/frameworks/platforms to read in a WSDL and output proxy classes and clients. Implementing a REST service is done by hand by reading documentation."
Just like browser vendors have read and re-read the HTML 4.01 specification up and down to try to implement a consistent browsing experience. Have you reflected on the fact that browsers were invented long before internet banking and stackoverflow, and yet, you can use a browser to do just those things. This is made possible because of the sole reason that everybody agrees to use HTML (and related formats like CSS, JS, JPEG etc).
Blogging is actually not that new, and someone came up with AtomPub, which allows any blogging software to access and update posts in a blog, much like any web browser can access any web page. That's pretty neat, and works because of the RESTful constraints imposed by the protocol.
But for Twitter and Netflix, there is no universal agreement that "all microblogs in existence shall use the media type application/tweet", mainly because microblogging is so new. Maybe in a few years time a few microblogging services settle on the same API so that Twitter, Facebook, Identica and can interoperate. None of their existing APIs are anywhere near RESTful, however much they claim, so I don't expect it to happen real soon.
"Furthermore, while implementing these two services, you have to make "guesses" as to what will come back across the pipe as there is no real schema or reference document."
You've hit the nail on the head. REST is all about distributed and hypermedia, and that pretty much sums it up. A browser looks at what it gets from a request and shows it to the user. A HTML page usually spawns a lot more GET requests, for example CSS, scripts and images. An image is typically only rendered to the screen, JavaScript is executed, and so on. Each time, the browser does what it does because it found the link in an <img> or <style> tag and the response media type was image/jpeg or text/css.
If Twitter makes a hypermedia based API, it will probably always return an application/tweet every time you follow a link to a tweet, but the client should never assume it, and always check what it gets before acting on it.
"Why write a REST service that returns XML anyway?"
This all boils down to media types. Like HTML, if you see an element that you've no idea what actually means, the HTML spec instructs you to ignore them, and process the "body" of the tag if it has one. Likewise, the atom spec instructs you to ignore unknown elements and foreign markup (from different namespaces) and not process the body (IIRC).
Designing media types for generic problem domains (as in the HTML media type for the rich text problem domain) is very hard. Making media types for very narrow problem domains is probably a lot easier (like a tweet). But it's always a good idea to design for extensibility and specify how clients (and servers) are supposed to react when they see elements or data items that don't match the spec. JPEG, for example has an Application-specific record type (e.g. APP1) which is used to contain all sorts of meta data.
"I've heard the complaint that with SOAP you have the "overhead" of the SOAP Envelope. In this day and age, do we really need to worry about a handful of bytes?"
No, we don't. REST is absolutely not about being efficient over the wire, it's actually trading wire efficiency in. REST's efficiency comes from the possibilities of caching enabled by all the other constraints: Fielding's dissertation notes: The trade-off, though, is that a uniform interface degrades efficiency, since information is transferred in a standardized form rather than one which is specific to an application's needs. The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction. I don't think that the SOAP Envelope byte count overhead is a valid concern.
"I've heard the argument that with REST you can just pop the URL into the browser and see the data."
Yes, that's also an invalid argument. It doesn't work that way. Even if it did work, most narrow REST APIs out there use media types that browsers have no idea about and it still won't work.
But there are a lot more possibilities than a browser to test a HTTP based API, like command line utilities or browser extensions that allow you to control almost any aspect of a HTTP request, inspect response headers and discover links for you to follow. But even so, this is nowhere near as easy as generating WSDL stubs and making a three line program to call the function anyway.
"Why do we need a "readable" URL for each resource? If we were using a tool to implement the service, do we really care about the actual URL?"
If you look at how the web works, I'm pretty sure that humans are by and large glad that the URI for a wikipedia page looks like this, http://en.wikipedia.org/wiki/Stack_overflow instead of http://en.wikipedia.org/wiki/?oldid=376349090. But it actually is not important to REST. The important thing to try to get right is to choose to place relevant data in the URI that is not likely to change. You might think that the database ID will never change, but what happens when two data sets need to be merged? All your primary keys change. The page title (Stack_overflow) will not change.
Sorry for the long response, but I believe this question is valid, and hasn't been addressed before here on SO. I'm sure Darrel Miller will add his answer once he's back too.
Edit: formatting

Martin Fowler has a post on the Richardson Maturity Model which does a great job explaining the difference between SOAP and REST.

WSDL and other document level protocols are redundant. The HTTP protocol supports a much richer set of operations besides just serving documents and submitting forms.
Supporters of REST are uncomfortable with that redundancy.

Related

Terminology question: API somewhere between SOAP and REST - what is the name for them?

My understanding of SOAP vs REST:
REST = JSON, simple consistent interface, gives you CRUD access to 'entities' (Abstractions of things which are not necessarily single DB rows), simpler protocol, no formally enforced 'contract' (e.g. the values an endpoint returns could change, though it shouldn't)
SOAP = XML, more complex interface, gives you access to 'services' (specific operations you can apply to entities, rather than allowing you to CRUD entities directly), formally enforced, pre-stated 'contract' (like a WSDL, where e.g. the return types are predefined and formalized)
Is that a broadly correct assessment?
What about a mixture?
If so, what do I call an API that is a mixture?
For example, If we have what at surface level looks like a REST API (returns JSON, no WSDL or formalized contract defined - but instead of giving you access to the 'entities' that the system manages (User, product, comment, etc) it instead gives you specific access to services and complex operations (/sendUserAnUpdate/1111, /makeCommentTextPurple/3333, /getAllCommentsByUserThisYear/2222) without having full coverage?
The 'services' already exist internally, and the team simply publishes access to them on a request by request basis, through what would otherwise look like a REST API.
Question:
What is the 'mixture' typically referred to as (besides, maybe, a bad API). Is there a word for it? or a concept I can refer to that'll make most developers understand what I'm referring to, without having to say the entire paragraph I did above?
Is it just "JSON SOAP API?", "A Service-based REST API?" - what would you call it?
Thanks!
Thanks!
If you take a look at all those so-called REST-APIs your observation might seem true, though REST actually is something completely different. It describes an architecture or a philosophy whose intent it is to decouple clients from servers, allowing the latter one to evolve in future without breaking clients. It is quite similar to the typical Web page interaction in that a server will teach a client on what it needs and only reacts on client-triggered requests. One has to be pretty careful and pendant when designing REST services as it is too easy to include a coupling that may affect clients when a change is introduced, especially with all the pragmatism around in (commercial) software engineering. Stefan Tilkov gave a great talk on REST back in 2014 that, alongside with Jim Webber or Asbjørn Ulsberg, can be used as introduction lectures to what REST is at its core.
The general premise in REST should always be that a server teaches clients what they need and what a server expects and offers choices to the client via links. If the server expects to receive data from the client it will send a form-esque representation to inform the client about the respective fields it supports and based on the affordance of the respective elements contained in the form a client knows whether to select one or multiple options, enter some free text or enter a date value and such. Unfortunately, most of the media-type formats that attempt to mimic HTML's forms are still in draft versions.
If you take a look at HTML forms in particular you might sense what I'm refering to. Each of the elements that may occur inside a form are well defined to avoid abmiguity and improve interoperability. This is defacto the ultimate goal in REST, having one client that is able to interact with a sheer amount of other services without having to be adapted to each single API explicitely.
The beauty of REST is, it isn't limited to a single representation form, i.e. JSON, in fact there is almost an infinite number of possible representation formats that could be exchanged in a REST environment. Plain application/json is a terrible media-type for REST applications IMO as it doesn't include any defintions in regards to links and forms and doesn't describe the semantics of certain fields that may be shipped in requests and responses. The lack of semantical description usually leads to typed resources where a recipient expects that receiving data from i.e. /api/users returns some specific user data, that may differ from host to host. If you skim through IANA's media type registry you will find a couple of media-type formats you could have used to transfer user-related data and any client supporting these representation formats whold be able to interact with this enpoint without any issues. Fielding himself claimed that
A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources and driving application state, or in defining extended relation names and/or hypertext-enabled mark-up for existing standard media types. Any effort spent describing what methods to use on what URIs of interest should be entirely defined within the scope of the processing rules for a media type (and, in most cases, already defined by existing media types). (Source)
Through content-type negotiation client and server will negotiate about a representation format both support and understand. The question therefore shouldn't be which one to support but how many you want to support. The more media-type your API or client is able to exchange payloads for, the more likely it will be to interact with other participants.
Most of those so-called REST APIs are in reality just RPC services exposed via HTTP that may or may not respect and support certain HTTP operations. HTTP thereby is just a transport layer whose domain is the transfer of files or data over the Web. Plenty of people still believe that you shouldn't put verbs in URIs when in reality a script or process usually doesn't (and shouldn't) care whether a URI contains a verb or not. The URI itself is just a pointer a client will follow and invoke when it is interested in receiving the payload. We humans are also not that much interested in the URI itself in regards to the content it may return after invoking that URI. The same holds true for arbitrary clients. It is more important what you ship along with that URI. On the Web a link can be annotated with certain text and/or link relation names that set the links content in relation to the current page. It may hint a client that certain content may be invoked before the whole response was parsed as it is quite likely that the client will also want to know about that. preload i.e. is such a link-relation name that hints the client about that. If certain domain-specific terms exist one might use an extension scheme as defined by Web linking or reuse common knowlege or special microformats.
The whole interaction in a REST environment is similar to playing a text-based computer game or following a certain process flow (i.e. ordering and paying produts) defined by an application domain protocol, that can be designed as a state machine. The client is therefore guided through the whole process. It basically just follows the orders the server gave it, with some choices to break out of the process (i.e. cancel the order before paying).
SOAP on the otherhand is, as you've stated, an XML-based RPC protocol reusing a subset of HTTP to exchange requests and responses. The likelihood that when you change something within your WSDL plenty of clients have to be adapted and recompiled are quite high. SOAP even defines its own security mechanism instead of reusing TLS, which requires explicit support by the clients therefore. As you have a one-to-one communication model due to the state that may be kept in process, scaling SOAP services isn't that easy. In a REST environment this is just a matter of adding a load-balancer before the server and then mirroring the server n-times. The load-balancer can send the request to any of the servers due to the stateless constraint
What is the 'mixture' typically referred to as (besides, maybe, a bad API). Is there a word for it? or a concept I can refer to that'll make most developers understand what I'm referring to, without having to say the entire paragraph I did above?
Is it just "JSON SOAP API?", "A Service-based REST API?" - what would you call it?
The general term for an API that communicates on top of HTTP would be Web API or HTTP API IMO. This article also uses this term. It also lists XML-RPC and JSON-RPC besides SOAP. I do agree with Voice though that you'll receive 5 answers on asking 4 people about the right term to use. While it would be convenient to have a respective term available everyone would agree upon, the reality shows that people are not that interested in a clear separation. Just look here at SO on the questions taged with rest. There is nothing wrong with not being "RESTful", though one should avoid the term REST for truly RPC services. Though I think we are already in a situation where the term REST can't be rescued from misusage and marketing purposes.
For something that requires external documentation to use and that ships with its own custom, non-standardized representation format or that just exposes CRUD for domain objects I'd add -RPC to it, as this is more or less what it is at its heart. So if the API sends JSON and the representation to expect is documented via Swagger or some other external documentationJSON-RPC would probably the most fitting name IMO.
To sum up this post, I hope I could shed some light on what REST truly is and how your observation is flawed by all those pragmatic attempts that unfortunately are RPC through and through. If you change something within their implementation, how many clients will break? In addition to that you can't reuse the client that you've implemented for API A to interact with API B (of a different company or vendor) out of the box and therefore have to either adapt your client or create a new one solely for that API. This is true RPC and therfore should be reflected in the name somehow to hint developers about future expectations. Unfortunately, the process of naming things propperly, especially in regards to REST, seems already lost. There is a fine but tiny group who attempt to spread the true meaning, like Voice, Cassio and some others, though it is like fighting windmills. The best advice here would be to first discuss the naming conventions and what each participant understand on which term and then agree on a naming scheme everyone agrees on to avoid future confusion.
My understanding of SOAP vs REST
...
Is that a broadly correct assessment?
No.
REST is an "architectural style", which is to say a coordinated collection of architectural constraints. The World Wide Web is an example of an application built using the REST architectural style.
SOAP is a transport agnostic message protocol specification, based on XML Information Set
If so, what do I call an API that is a mixture?
I don't think you are going to find an authoritative terminology here. Colloquially, you are likely to hear the broad umbrella term "web api" to describe an HTTP API that isn't "RESTful".
The whole space is rather polluted by semantic diffusion.

Can GET, PUT and PATCH be replaced with POST HTTP method?

POST , PUT, PATCH and GET are all different. Idempotent and safety being the key difference makers.
While writing RESTFul APIs , I encountered guidelines on when and where to use one of the HTTP methods. Since I am using Java for the back-end implementation, I can control the behavior of the HTTP methods on the persistent data.
For example , GET v1/book/{id} can be replaced with POST v1/book (with "id" in body) now with that id I can perform a query on db , fetching that particular book. (assuming book with that id already exists).
Similarly , I can achieve the workings of PATCH and PUT with POST itself.
Now, coming to the question , why don't we just use POST instead of GET , PUT and PATCH almost every time, ALMOST, when we can control the idempotent and safety behavior in the back-end?
Or , Is it just a guideline mentioned in RESTFul docs somewhere or stated by Roy fielding and we all are blindly following? Even if the guidelines are so what is the major idea behind them?
https://restfulapi.net/rest-put-vs-post/
https://restful-api-design.readthedocs.io/en/latest/methods.html
https://www.keycdn.com/support/put-vs-post
Above resources just mention either what does all the methods do or their differences. Articles mention the workings as if they were some guidelines , none of the docs online speak about the reason behind them.
None of them says , what if I used POST instead of PUT, PATCH and GET, what would be the side-effects? (as I can control their behaviors in the back-end)
Http methods are designed in the way that each method holds some responsibility. I will say that REST are the standards which are conventions and not the obligation. The convention doesn't stress us to follow the rules but they are designed for our code betterment. You can tweak the things and can use them in your way but that would be a bad idea. Like in this case if you are performing all the three actions with one method it would create great confusion in code (As the simple definition of POST is to create an object and that is what understood by everyone) and also degrade our coding standards.
I strongly discourage to replace three methods with one.
If you do that, you can't say you are "writing RESTFul APIs".
Whoever knows the RESTFul standard, will be confused about the behaviour of your apis.
If you fit the standard, then you will have an easier life.
After all, you have no real benefit in your approach.
HTTP is a transport protocol which as its name suggest is responsible for transfering data such as files or db entries across the wire to or from a remote system. In version 0.9 you basically only had the GET operation at your disposal while in HTTP 1.0 almost all of the current operations were added to the spec.
Each of these methods fulfills its own purpose. POST i.e. does process the payload according to the server's own semantics, whatever they will be. In theory it could be used therefore for retrieving, updating or removing content. Though, to a client it is basically unclear what a server actually does with the payload. There is no guarantee whether invoking a URI with that method is safe (the remote resource being altered) or not. Think of a crawler that is just invoking any URIs it finds and one of the links is an order link or a link where you perform a payment process. Do you really want a crawler to trigger one of your processes? The spec is rather clear that if something like that happens, the client must not made accountable for that. So, if a crawler ordered 10k products as one of your links, did trigger such a process, and the products are created in that process, you can't claim refund from the crawler's maintainer.
In additon to that, a response from a GET operation is cacheable by default. So if you invoke the same resource twice in a certain amount of time, chances are that the resource does not need to be fetched again a second (third, ...) time as it can be reused from the cache. This can reduce the load on the server quite significantly if used propperly.
As you've mentioned Fielding and REST. REST is an architectural style which you should use if you have plenty of different clients connecting to your services that are furthermore not under your control. Plenty of so-called REST APIs aren't adhering to REST as they follow a more simple and pragmatic RPC approach with external documentations such as Swagger and similar. RESTs main focus is on the decoupling of clients from servers which allow the latters to evolve freely without having to fear breaking clients. Clients on the other hand get more robust to changes.
Fielding only added few constraints a REST architecture has to adhere to. One of them is support for caching. Though Fielding later on wrote a well-cited blog-post where he explains what API designers have to consider before calling their API REST. True decoupling can only occur if all of the constraints are followed strictly. If only one clients violates these premises it won't benefit from REST at all.
The main premise in REST is (and should always be): Server teaches clients what they need and clients only use what they are served with. In the browsable Web, the big cousin of REST, a server will teach a client i.e. on what data the server expects via Web Forms through HTML and links are annotated with link-relation names to give the browser some hints on when to invoke that URI. On a Web page a trash bin icon may indicate a delition while a pencil icon may indicate an edit link. Such visual hints are also called affordacne. Such visual hints may not be applicable in machine to machine communication though such affordances may hint on other things they may provide. Think of a stylesheet that is annotated with preload. In HTTP 2 i.e. such a resource could be attempted to be pushed by the server or in HTTP 1.1 the browser could alread load that stylesheet while the page is still parsed to speed things up. In order to gain whitespread knowledge of those meanings, such values should be standardized at IANA. Through custom extensions or certain microformats such as dublin core or the like you may add new relation names that are too specific for common cases but are common to the domain itself.
The same holds true for media-types client and server negotiate about. A general applicable media-type will probably reach wider acceptance than a tailor-made one that is only usable by a single company. The main aim here is to reach a point where the same media-type can be reused for various areas and APIs. REST vision is to have a minimal amount of clients that are able to interact with a plethora of servers or APIs, similar to a browser that is able to interact with almost all Web sites.
Your ultimate goal in REST is that a user is following an interaction protocol you've set up, which could be something similar to following an order process or playing a text game or what not. By giving a client choices it will progress through a certain process which can easily be depicted as state machine. It is following a kind of application-driven protocol by following URIs that caught the clients attention and by returning data that was taught through a form like representation. As, more or less, only standardized representation formats should be used, there is no need for out-of-band information on how to interact with the API necessary.
In reality though, plenty of enterprises don't really care about long-lasting APIs that are free to evolve over the years but in short-term success stories. They usually also don't care that much whether they use the propper HTTP operations at all or stay in bounds with the HTTP spec (i.e. sending payloads with HTTP GET requesst). Their primary intent is to get the job done. Therefore pragmatism usually wins over design and as such plenty of developers follow the way of short success and have to adept their work later on, which is often cumbersome as the API is now the driving factor of their business and therefore they can't change it easily without having to revampt the whole design.
... why don't we just use POST instead of GET , PUT and PATCH almost every time, ALMOST, when we can control the idempotent and safety behavior in the back-end?
A server may know that a request is idempotent, but the client does not. Properties such as safe and idempotency are promisses to the client. Whether the server satisfies these or not is a different story. How should a client know whether a sent payment request reached the server and the response just got lost or the initial request didn't make it to the server at all in case of a temporary connection issue? A PUT requests does guarante idempotency. I.e. you don't want to order the same things twice if you resubmit the same request again in case of a network issue. While the same request could also be sent via POST and the server being smart enough to not process it again, the client doesn't know the server's behavior unless it is externally documented somehwere, which violates REST principles again also somehow. So, to state it differently, such properties are more or less promisses to the client, less to the server.

RESTful API runtime discoverability / HATEOAS client design

For a SaaS startup I'm involved in, I am building both a RESTful web API and a couple of client apps on different platforms that consume it. I think I've got the API figured out, but now I'm turning to the clients. As I've been reading about REST, I see that a key part of REST is discovery, but there seems to be a lot of debate between two different interpretations of what discovery really means:
Developer discovery: The developer hard-codes copious amounts of API details into the client, such as resource URI's, query parameters, supported HTTP methods, and other details that they've discovered through browsing the docs and experimenting with the API's responses. This type of discovery IMHO necessitates cool linkage and the API versioning question, and leads to hard coupling of the client code to the API. Not much better than if using a well-documented collection of RPC's it seems.
Runtime discovery - The client app itself is able to figure out everything it needs with little or no out-of-band information (presumably, only a knowledge of the media types the API deals with.) Links can be hot. But to make the API very efficient, a lot of link templating for query parameters seems to be needed, which makes out-of-band info creep back in. There are possibly other difficulties I haven't thought of yet since I haven't gotten to that point in development. But I do like the idea of loose coupling.
Runtime discovery seems to be the holy grail of REST, but I'm seeing precious little discussion about how to implement such a client. Almost all REST sources I've found seem to assume Developer discovery. Anyone know of some Runtime discovery resources? Best practices? Examples or libraries with real code? I'm working in PHP (Zend Framework) for one client. Objective-C (iOS) for the other.
Is Runtime discovery a realistic goal, given the present set of tools and knowledge in the developer community? I can write my client to treat all of the URI's in an opaque manner, but how to do this most efficiently is a question, especially over low-bandwidth connections. Anyway, URI's are only part of the equation. What about link templating in the Runtime context? How about communicating what methods are supported, aside from making a lot of OPTIONS requests?
This is definitely a tough nut to crack. At Google, we've implemented our Discovery Service that all our new APIs are built against. The TL;DR version is we generate a JSON Schema-like spec that our clients can parse - many of them dynamically.
That results means easier SDK upgrades for the developer and easy/better maintenance for us.
By no means the perfect solution, but many of our devs seem to like.
See link for more details (and make sure to watch the vid.)
Fascinating. What you are describing is basically the HATEOAS principle. What is HATEOAS you ask? Read this: http://en.wikipedia.org/wiki/HATEOAS
In layman's terms, HATEOAS means link following. This approach decouples your client from specific URL's and gives you the flexibility to change your API without breaking anyone.
You did your home work and you got to the heart of it: runtime discovery is holy grail. Don't chase it.
UDDI tells a poignant story of runtime discovery: http://en.wikipedia.org/wiki/Universal_Description_Discovery_and_Integration
One of the requirements that should be satisfied before you can call an API 'RESTful' is that it should be possible to write a generic client application on top of that API. With the generic client, a user should be able to access all the API's functionality. A generic client is a client application that does not assume that any resource has a specific structure beyond the structure that is defined by the media type. For example, a web browser is a generic client that knows how to interpret HTML, including HTML forms etc.
Now, suppose we have a HTTP/JSON API for a web shop and we want to build a HTML/CSS/JavaScript client that gives our customers an excellent user experience. Would it be a realistic option to let that client be a generic client application? No. We want to provide a specific look-and-feel for every specific data element and every specific application state. We don't want to include all knowledge about these presentation-specifics in the API, on the contrary, the client should define the look and feel and the API should only carry the data. This implies that the client has hard-coded coupling of specific resource elements to specific layouts and user interactions.
Is this the end of HATEOAS and thus the end of REST? Yes and no.
Yes, because if we hard-code knowledge about the API into the client, we loose the benefit of HATEOAS: server-side changes may break the client.
No, for two reasons:
Being "RESTful" is a property of the API, not of the client. As long as it is possible, in theory, to build a generic client that offers all capabilities of the API, the API can be called RESTful. The fact that clients don't obey the rules, is not the API's fault. The fact that a generic client would have a lousy user experience is not an issue. Why is it important to know that it is possible to have a generic client, if we don't actually have that generic client? This brings me to the second reason:
A RESTful API offers clients the option to choose how generic they want to be, i.e. how resilient to server-side changes they want to be. Clients which need to provide a great user experience may still be resilient to URI changes, to changes in default values and more. Clients doing batch jobs without user interaction may be resilient to other kinds of changes.
If you are interested in practical examples, checkout my JAREST paper. The last section is about HATEOAS. You will see that with JAREST, even highly interactive and visually attractive clients can be quite resilient to server-side changes, though not 100%.
I think the important point about HATEOAS is not that it is some holy grail client-side, but that it isolates the client from URI changes - it is assumed you are using known (or developer discovered custom) Link Relations that will allow the system to know which link for an object is the editable form. The important point is to use a media type that is hypermedia aware (e.g. HTML, XHTML, etc).
You write:
To make the API very efficient, a lot of link templating for query parameters seems to be needed, which makes out-of-band info creep back in.
If that link template is supplied in the previous request, then there is no out-of-band information. For example a HTML search form uses link templating (/search?q=%#) to generate a URL (/search?q=hateoas), but nothing is known by the client (the web browser) other than how to use HTML forms and GET.

What is the advantage of using REST instead of non-REST HTTP?

Apparently, REST is just a set of conventions about how to use HTTP. I wonder which advantage these conventions provide. Does anyone know?
I don't think you will get a good answer to this, partly because nobody really agrees on what REST is. The wikipedia page is heavy on buzzwords and light on explanation. The discussion page is worth a skim just to see how much people disagree on this. As far as I can tell however, REST means this:
Instead of having randomly named setter and getter URLs and using GET for all the getters and POST for all the setters, we try to have the URLs identify resources, and then use the HTTP actions GET, POST, PUT and DELETE to do stuff to them. So instead of
GET /get_article?id=1
POST /delete_article id=1
You would do
GET /articles/1/
DELETE /articles/1/
And then POST and PUT correspond to "create" and "update" operations (but nobody agrees which way round).
I think the caching arguments are wrong, because query strings are generally cached, and besides you don't really need to use them. For example django makes something like this very easy, and I wouldn't say it was REST:
GET /get_article/1/
POST /delete_article/ id=1
Or even just include the verb in the URL:
GET /read/article/1/
POST /delete/article/1/
POST /update/article/1/
POST /create/article/
In that case GET means something without side-effects, and POST means something that changes data on the server. I think this is perhaps a bit clearer and easier, especially as you can avoid the whole PUT-vs-POST thing. Plus you can add more verbs if you want to, so you aren't artificially bound to what HTTP offers. For example:
POST /hide/article/1/
POST /show/article/1/
(Or whatever, it's hard to think of examples until they happen!)
So in conclusion, there are only two advantages I can see:
Your web API may be cleaner and easier to understand / discover.
When synchronising data with a website, it is probably easier to use REST because you can just say synchronize("/articles/1/") or whatever. This depends heavily on your code.
However I think there are some pretty big disadvantages:
Not all actions easily map to CRUD (create, read/retrieve, update, delete). You may not even be dealing with object type resources.
It's extra effort for dubious benefits.
Confusion as to which way round PUT and POST are. In English they mean similar things ("I'm going to put/post a notice on the wall.").
So in conclusion I would say: unless you really want to go to the extra effort, or if your service maps really well to CRUD operations, save REST for the second version of your API.
I just came across another problem with REST: It's not easy to do more than one thing in one request or specify which parts of a compound object you want to get. This is especially important on mobile where round-trip-time can be significant and connections are unreliable. For example, suppose you are getting posts on a facebook timeline. The "pure" REST way would be something like
GET /timeline_posts // Returns a list of post IDs.
GET /timeline_posts/1/ // Returns a list of message IDs in the post.
GET /timeline_posts/2/
GET /timeline_posts/3/
GET /message/10/
GET /message/11/
....
Which is kind of ridiculous. Facebook's API is pretty great IMO, so let's see what they do:
By default, most object properties are returned when you make a query.
You can choose the fields (or connections) you want returned with the
"fields" query parameter. For example, this URL will only return the
id, name, and picture of Ben:
https://graph.facebook.com/bgolub?fields=id,name,picture
I have no idea how you'd do something like that with REST, and if you did whether it would still count as REST. I would certainly ignore anyone who tries to tell you that you shouldn't do that though (especially if the reason is "because it isn't REST")!
Simply put, REST means using HTTP the way it's meant to be.
Have a look at Roy Fielding's dissertation about REST. I think that every person that is doing web development should read it.
As a note, Roy Fielding is one of the key drivers behind the HTTP protocol, as well.
To name some of the advandages:
Simple.
You can make good use of HTTP cache and proxy server to help you handle high load.
It helps you organize even a very complex application into simple resources.
It makes it easy for new clients to use your application, even if you haven't designed it specifically for them (probably, because they weren't around when you created your app).
Simply put: NONE.
Feel free to downvote, but I still think there are no real benefits over non-REST HTTP. All current answers are invalid. Arguments from the currently most voted answer:
Simple.
You can make good use of HTTP cache and proxy server to help you handle high load.
It helps you organize even a very complex application into simple resources.
It makes it easy for new clients to use your application, even if you haven't designed it specifically for them (probably, because they weren't around when you created your app).
1. Simple
With REST you need additional communication layer for your server-side and client-side scripts => it's actually more complicated than use of non-REST HTTP.
2. Caching
Caching can be controlled by HTTP headers sent by server. REST does not add any features missing in non-REST.
3. Organization
REST does not help you organize things. It forces you to use API supported by server-side library you are using. You can organize your application the same way (or better) when you are using non-REST approach. E.g. see Model-View-Controller or MVC routing.
4. Easy to use/implement
Not true at all. It all depends on how well you organize and document your application. REST will not magically make your application better.
IMHO the biggest advantage that REST enables is that of reducing client/server coupling. It is much easier to evolve a REST interface over time without breaking existing clients.
Discoverability
Each resource has references to other resources, either in hierarchy or links, so it's easy to browse around. This is an advantage to the human developing the client, saving he/she from constantly consulting the docs, and offering suggestions. It also means the server can change resource names unilaterally (as long as the client software doesn't hardcode the URLs).
Compatibility with other tools
You can CURL your way into any part of the API or use the web browser to navigate resources. Makes debugging and testing integration much easier.
Standardized Verb Names
Allows you to specify actions without having to hunt the correct wording. Imagine if OOP getters and setters weren't standardized, and some people used retrieve and define instead. You would have to memorize the correct verb for each individual access point. Knowing there's only a handful of verbs available counters that problem.
Standardized Status
If you GET a resource that doesn't exist, you can be sure to get a 404 error in a RESTful API. Contrast it with a non-RESTful API, which may return {error: "Not found"} wrapped in God knows how many layers. If you need the extra space to write a message to the developer on the other side, you can always use the body of the response.
Example
Imagine two APIs with the same functionality, one following REST and the other not. Now imagine the following clients for those APIs:
RESTful:
GET /products/1052/reviews
POST /products/1052/reviews "5 stars"
DELETE /products/1052/reviews/10
GET /products/1052/reviews/10
HTTP:
GET /reviews?product_id=1052
POST /post_review?product_id=1052 "5 stars"
POST /remove_review?product_id=1052&review_id=10
GET /reviews?product_id=1052&review=10
Now think of the following questions:
If the first call of each client worked, how sure can you be the rest will work too?
There was a major update to the API that may or may not have changed those access points. How much of the docs will you have to re-read?
Can you predict the return of the last query?
You have to edit the review posted (before deleting it). Can you do so without checking the docs?
I recommend taking a look at Ryan Tomayko's How I Explained REST to My Wife
Third party edit
Excerpt from the waybackmaschine link:
How about an example. You’re a teacher and want to manage students:
what classes they’re in,
what grades they’re getting,
emergency contacts,
information about the books you teach out of, etc.
If the systems are web-based, then there’s probably a URL for each of the nouns involved here: student, teacher, class, book, room, etc. ... If there were a machine readable representation for each URL, then it would be trivial to latch new tools onto the system because all of that information would be consumable in a standard way. ... you could build a country-wide system that was able to talk to each of the individual school systems to collect testing scores.
Each of the systems would get information from each other using a simple HTTP GET. If one system needs to add something to another system, it would use an HTTP POST. If a system wants to update something in another system, it uses an HTTP PUT. The only thing left to figure out is what the data should look like.
I would suggest everybody, who is looking for an answer to this question, go through this "slideshow".
I couldn't understand what REST is and why it is so cool, its pros and cons, differences from SOAP - but this slideshow was so brilliant and easy to understand, so it is much more clear to me now, than before.
Caching.
There are other more in depth benefits of REST which revolve around evolve-ability via loose coupling and hypertext, but caching mechanisms are the main reason you should care about RESTful HTTP.
It's written down in the Fielding dissertation. But if you don't want to read a lot:
increased scalability (due to stateless, cache and layered system constraints)
decoupled client and server (due to stateless and uniform interface constraints)
reusable clients (client can use general REST browsers and RDF semantics to decide which link to follow and how to display the results)
non breaking clients (clients break only by application specific semantics changes, because they use the semantics instead of some API specific knowledge)
Give every “resource” an ID
Link things together
Use standard methods
Resources with multiple representations
Communicate statelessly
It is possible to do everything just with POST and GET? Yes, is it the best approach? No, why? because we have standards methods. If you think again, it would be possible to do everything using just GET.. so why should we even bother do use POST? Because of the standards!
For example, today thinking about a MVC model, you can limit your application to respond just to specific kinds of verbs like POST, GET, PUT and DELETE. Even if under the hood everything is emulated to POST and GET, don't make sense to have different verbs for different actions?
Discovery is far easier in REST. We have WADL documents (similar to WSDL in traditional webservices) that will help you to advertise your service to the world. You can use UDDI discoveries as well. With traditional HTTP POST and GET people may not know your message request and response schemas to call you.
One advantage is that, we can non-sequentially process XML documents and unmarshal XML data from different sources like InputStream object, a URL, a DOM node...
#Timmmm, about your edit :
GET /timeline_posts // could return the N first posts, with links to fetch the next/previous N posts
This would dramatically reduce the number of calls
And nothing prevents you from designing a server that accepts HTTP parameters to denote the field values your clients may want...
But this is a detail.
Much more important is the fact that you did not mention huge advantages of the REST architectural style (much better scalability, due to server statelessness; much better availability, due to server statelessness also; much better use of the standard services, such as caching for instance, when using a REST architectural style; much lower coupling between client and server, due to the use of a uniform interface; etc. etc.)
As for your remark
"Not all actions easily map to CRUD (create, read/retrieve, update,
delete)."
: an RDBMS uses a CRUD approach, too (SELECT/INSERT/DELETE/UPDATE), and there is always a way to represent and act upon a data model.
Regarding your sentence
"You may not even be dealing with object type resources"
: a RESTful design is, by essence, a simple design - but this does NOT mean that designing it is simple. Do you see the difference ? You'll have to think a lot about the concepts your application will represent and handle, what must be done by it, if you prefer, in order to represent this by means of resources. But if you do so, you will end up with a more simple and efficient design.
Query-strings can be ignored by search engines.

Alternatives to YQL

This is a multi-part question. I just watched a very interesting presentation on YQL by the lead developer (a graduate of my MS program). While it was very compelling, and I am looking forward to trying it out, I am wondering if anyone knows of alternative frameworks for querying multiple web service APIs to make them appear seamless, the apparent purpose of YQL?
Yahoo's strategy has been to create XML schema definitions that bind a given web service's parameters into their YQL Open Table query parameters, which I think is very clever. Is there any tool that attempts (perhaps I am naive here) to automate the discovery of parameters in say a REST API? I am aware that with SOAP APIs, because there is a published WSDL, it makes automation easier, but is there yet no way to do this with REST? Is anyone trying?
Yes people are trying to produce description languages for REST. The most popular effort is WADL. There are lots of questions about WADL here on SO. Is it a good idea? In my opinion no.
REST does not need a discovery model beyond what it already has with hypermedia, because is trying to solve a problem at a different architectural layer than web services. Web services deliver data to an application's business logic/domain model. REST is about delivering content and behaviour to a presentation layer.
How about an analogy? Think of the different between an object and struct in C++. A struct is just simple data that some client process is going to manipulate. That's what a web service does, it returns a chunk of data, a struct. Sure maybe it did a bunch of server side processing to produce the result, but the end result is a lump of data. A REST interface delivers an object. i.e. It contains both data and the methods that can be used to manipulate that object. By definition, if you understand the uniform interface and you understand the returned media type, you already know what you can do with the response. Discovery mechanisms are redundant.
If you find this hard to believe, the think about the web. How does a web browser discover web pages? The web has no formalized discovery mechanism, and yet there is a world of information out there that we can discover with a web browser.
There is this little website http://zachgrav.es/yql/tablesaw/ which indeed auto-discovers parameters in a REST api and turns it into a YQL compatible table.
There are two ways to find information. Either you use a 100% unambiguous language or you use a natural language. Anything in between like YQL is doomed to fail because it delivers neither and works well only with the examples its authors tout.
I blogged about this at http://zscraper.wordpress.com/2012/05/30/enough-with-crawling-2. My personal stance is that you'll always get the most accurate results if you do your homework first, i.e. study the target domain and figure out how to query it unambiguously.
To answer your question and give you an alternative -- try Bobik. This is a cloud-backed scraping service that you control via REST API. Compose your "queries" in traditional syntax (Bobik supports Javascript, JQuery, XPATH and CSS) and call Bobik to run them from any client-side environment (webpages, mobile apps, or your server).
Hope this helps.