Supporting a RESTful service - Should I use Pragma, Cookies, a Custom Header or something else to identify client sessions and transactions? - rest

The problem
We are building an SOA with a RESTful approach to the services. Once the systems are in production we will have many clients consuming the interface including internal and 3rd party systems.
We would like to be able to consume and echo in the response information provided by the client application such as: -
Session Id - Could be a Java EE session id or anything client specific, this is useful for the support team and debugging client issues to trace them through all our systems.
Transaction Id - A unique identifier for the request that we can echo back to the client to aid the client in request/response correlation if they invoke the service asynchronously or if we implement an 202 Accepted style long running process.
Potential solutions
So sticking to the RESTful constraints would suggest we need to utilise HTTP to implement this and there are several options we could implement.
Pragma header - implement extension-pragmas for transaction-id, session-id, etc. This seems like purist of solutions as it utilises a standard HTTP header although I would be concerned it became a dumping ground for everything we can't be bothered to think about properly.
X-My-Header - custom headers for each field we require. May be stripped by proxies, not core HTTP so feels anti-rest
In query string or XML/JSON representations - Add the fields to all our resources. Because it's an operational parameter it feels like it should be provided as metadata rather than on a resource.
Cookies - use Cookie and Set-Cookie to hold custom key-values; useful in the case of session ids as most implementations use cookies already. Would have to re-send every time to support client side correlation which kind of defeats the point of using a cookie.
The answer
Is there any precedent for this? Are we mad? Is there something obvious I am missing in all my research? Does no-one actually care how they will support their services once they are deployed? Should I just shut up and go away?
I'm hoping someone can help.
P.S. sorry if this is a bit of an essay, the advice did say "be specific"....

Oh, this is a pain. I've been there too.
Well, the idea with metadata for transactions, sessions etc. is a good idea. For logging, at least.
The problem is to setup something that is compliant with various corporations policies and SOA infrastructure.
There is a tradeof between best design and maximum interopability in the case of HTTP.
The safe path is to encode the metadata in the message itself. Not very nice, and such a solution ends up looking a bit like SOAP where you have an envelope with headers for all messages.
I ended up using an X-header for information such as transaction id. However, as you mentioned, proxies/b2b-gateways etc. might strip headers, it's not obvious that you can retreive them with all appointed development frameworks, COTS applications etc. So if you do like this, you should avoid make the metadata mandatory to get a solution running - just "nice to have".
Cookies are nothing but pain. They might be annoying or sometimes even useful with browser interaction, but in a SOA scenario, it will be bad idea. Many things can go wrong and it's a pain to debug cross organisations.
I would also avoid using query strings along with POST or PUT data. It's possible according to the HTTP specs. but not when it comes to implementation in random framework.

You can use a GUID and let the client generate it and pass it as part of any request that starts the workflow/business process. This GUID can be used to correlate across multiple components participating in the workflow.

Related

Can GET, PUT and PATCH be replaced with POST HTTP method?

POST , PUT, PATCH and GET are all different. Idempotent and safety being the key difference makers.
While writing RESTFul APIs , I encountered guidelines on when and where to use one of the HTTP methods. Since I am using Java for the back-end implementation, I can control the behavior of the HTTP methods on the persistent data.
For example , GET v1/book/{id} can be replaced with POST v1/book (with "id" in body) now with that id I can perform a query on db , fetching that particular book. (assuming book with that id already exists).
Similarly , I can achieve the workings of PATCH and PUT with POST itself.
Now, coming to the question , why don't we just use POST instead of GET , PUT and PATCH almost every time, ALMOST, when we can control the idempotent and safety behavior in the back-end?
Or , Is it just a guideline mentioned in RESTFul docs somewhere or stated by Roy fielding and we all are blindly following? Even if the guidelines are so what is the major idea behind them?
https://restfulapi.net/rest-put-vs-post/
https://restful-api-design.readthedocs.io/en/latest/methods.html
https://www.keycdn.com/support/put-vs-post
Above resources just mention either what does all the methods do or their differences. Articles mention the workings as if they were some guidelines , none of the docs online speak about the reason behind them.
None of them says , what if I used POST instead of PUT, PATCH and GET, what would be the side-effects? (as I can control their behaviors in the back-end)
Http methods are designed in the way that each method holds some responsibility. I will say that REST are the standards which are conventions and not the obligation. The convention doesn't stress us to follow the rules but they are designed for our code betterment. You can tweak the things and can use them in your way but that would be a bad idea. Like in this case if you are performing all the three actions with one method it would create great confusion in code (As the simple definition of POST is to create an object and that is what understood by everyone) and also degrade our coding standards.
I strongly discourage to replace three methods with one.
If you do that, you can't say you are "writing RESTFul APIs".
Whoever knows the RESTFul standard, will be confused about the behaviour of your apis.
If you fit the standard, then you will have an easier life.
After all, you have no real benefit in your approach.
HTTP is a transport protocol which as its name suggest is responsible for transfering data such as files or db entries across the wire to or from a remote system. In version 0.9 you basically only had the GET operation at your disposal while in HTTP 1.0 almost all of the current operations were added to the spec.
Each of these methods fulfills its own purpose. POST i.e. does process the payload according to the server's own semantics, whatever they will be. In theory it could be used therefore for retrieving, updating or removing content. Though, to a client it is basically unclear what a server actually does with the payload. There is no guarantee whether invoking a URI with that method is safe (the remote resource being altered) or not. Think of a crawler that is just invoking any URIs it finds and one of the links is an order link or a link where you perform a payment process. Do you really want a crawler to trigger one of your processes? The spec is rather clear that if something like that happens, the client must not made accountable for that. So, if a crawler ordered 10k products as one of your links, did trigger such a process, and the products are created in that process, you can't claim refund from the crawler's maintainer.
In additon to that, a response from a GET operation is cacheable by default. So if you invoke the same resource twice in a certain amount of time, chances are that the resource does not need to be fetched again a second (third, ...) time as it can be reused from the cache. This can reduce the load on the server quite significantly if used propperly.
As you've mentioned Fielding and REST. REST is an architectural style which you should use if you have plenty of different clients connecting to your services that are furthermore not under your control. Plenty of so-called REST APIs aren't adhering to REST as they follow a more simple and pragmatic RPC approach with external documentations such as Swagger and similar. RESTs main focus is on the decoupling of clients from servers which allow the latters to evolve freely without having to fear breaking clients. Clients on the other hand get more robust to changes.
Fielding only added few constraints a REST architecture has to adhere to. One of them is support for caching. Though Fielding later on wrote a well-cited blog-post where he explains what API designers have to consider before calling their API REST. True decoupling can only occur if all of the constraints are followed strictly. If only one clients violates these premises it won't benefit from REST at all.
The main premise in REST is (and should always be): Server teaches clients what they need and clients only use what they are served with. In the browsable Web, the big cousin of REST, a server will teach a client i.e. on what data the server expects via Web Forms through HTML and links are annotated with link-relation names to give the browser some hints on when to invoke that URI. On a Web page a trash bin icon may indicate a delition while a pencil icon may indicate an edit link. Such visual hints are also called affordacne. Such visual hints may not be applicable in machine to machine communication though such affordances may hint on other things they may provide. Think of a stylesheet that is annotated with preload. In HTTP 2 i.e. such a resource could be attempted to be pushed by the server or in HTTP 1.1 the browser could alread load that stylesheet while the page is still parsed to speed things up. In order to gain whitespread knowledge of those meanings, such values should be standardized at IANA. Through custom extensions or certain microformats such as dublin core or the like you may add new relation names that are too specific for common cases but are common to the domain itself.
The same holds true for media-types client and server negotiate about. A general applicable media-type will probably reach wider acceptance than a tailor-made one that is only usable by a single company. The main aim here is to reach a point where the same media-type can be reused for various areas and APIs. REST vision is to have a minimal amount of clients that are able to interact with a plethora of servers or APIs, similar to a browser that is able to interact with almost all Web sites.
Your ultimate goal in REST is that a user is following an interaction protocol you've set up, which could be something similar to following an order process or playing a text game or what not. By giving a client choices it will progress through a certain process which can easily be depicted as state machine. It is following a kind of application-driven protocol by following URIs that caught the clients attention and by returning data that was taught through a form like representation. As, more or less, only standardized representation formats should be used, there is no need for out-of-band information on how to interact with the API necessary.
In reality though, plenty of enterprises don't really care about long-lasting APIs that are free to evolve over the years but in short-term success stories. They usually also don't care that much whether they use the propper HTTP operations at all or stay in bounds with the HTTP spec (i.e. sending payloads with HTTP GET requesst). Their primary intent is to get the job done. Therefore pragmatism usually wins over design and as such plenty of developers follow the way of short success and have to adept their work later on, which is often cumbersome as the API is now the driving factor of their business and therefore they can't change it easily without having to revampt the whole design.
... why don't we just use POST instead of GET , PUT and PATCH almost every time, ALMOST, when we can control the idempotent and safety behavior in the back-end?
A server may know that a request is idempotent, but the client does not. Properties such as safe and idempotency are promisses to the client. Whether the server satisfies these or not is a different story. How should a client know whether a sent payment request reached the server and the response just got lost or the initial request didn't make it to the server at all in case of a temporary connection issue? A PUT requests does guarante idempotency. I.e. you don't want to order the same things twice if you resubmit the same request again in case of a network issue. While the same request could also be sent via POST and the server being smart enough to not process it again, the client doesn't know the server's behavior unless it is externally documented somehwere, which violates REST principles again also somehow. So, to state it differently, such properties are more or less promisses to the client, less to the server.

Adapter Proxy for Restful APIs

this is a general 'what technologies are available' question.
My company provides a web application with a RESTful API. However, it is too slow for my needs and some of the results are in an awkward format.
I want to wrap their restful server with a proxy/adapter server, so when you connect to the proxy you get the RESTful API I wish the real one provides.
So it needs to do a few things:
passthrough most requests
cache some requests
do some extra requests on the original server to detect if a request is cacheable
for instance: there is a request for a field in a record: GET /records/id/field which might be slow, but there is a fingerprint request GET /records/id/fingerprint which is always fast. If there exists a cache of GET /records/1/field2 for the fingerprint feedbeef, then I need to check the original server still has the fingerprint feed beef before serving the cached version.
fix headers for some responses - e.g. content-type, based upon the path
do stream processing on some large content, for instance
GET /records/id/attachments/1234
returns a 100Mb log file in text format
remove null characters from files
optionally recode the log to filter out irrelevant lines, reducing the load on the client
cache the filtered version for later requests.
While I could modify the client to achieve this functionality, such code would not be re-usable for other clients (different languages), and complicates the client logic.
I had a look at whether clojure/ring could do it, and while there is a nice little proxy middleware for it, it doesn't handle streaming content as far as I can tell - the whole 100Mb would have to be downloaded. Also it doesn't include any cache logic yet.
I took a look at whether squid could do it, but I'm not familiar with the technology, and it seems mostly concerned with passing through requests rather than modifying them on the fly.
I'm looking for hints where I might find the correct technology to implement this. I'm mostly language agnostic if learning a new language gets me access to a really simple way to do it.
I believe you should choose a platform that is easier for you to implement your custom business logic on. The following web application frameworks provide easy connectivity with REST APIs, and allow you to create a web application that could work as a REST proxy:
Play framework (Java + Scala)
express + Node.js (Javascript)
Sinatra (Ruby)
I'm more familiar with Play, of which I know it provides utilities for caching you could find useful, and is also extendable by a number of plugins.
If you are familiar with Scala, you could have a also have a look at Finagle. It is a framework build be Twitter's infrastructure team to provide protocol-agnostic connectivity. It might be an overkill for REST to REST proxy, but it provides abstractions you might find useful.
You could also look at some 3rd party services like Apitools, which allows to create a proxy programmatically (in lua). Apirise is a similar service (of which I'm a co-founder) that intends to do provide similar functionalities with a user-friendly UI.
Beeceptor does exactly what you want. It plugs in-between your web-app and original API to route requests.
For your use-case of caching a few responses, you can create a rule. That way it shall not hit the original endpoint.
The requests to original APIs can be mocked, and you can inspect response
You can simulate delays.
(Note: it is a shameless plug, I am the author of Beeceptor and thought it should help you and other developers.)
https://github.com/nodejitsu/node-http-proxy is looking useful - although I don't yet know if it can stream process for transcoding.

RESTful API runtime discoverability / HATEOAS client design

For a SaaS startup I'm involved in, I am building both a RESTful web API and a couple of client apps on different platforms that consume it. I think I've got the API figured out, but now I'm turning to the clients. As I've been reading about REST, I see that a key part of REST is discovery, but there seems to be a lot of debate between two different interpretations of what discovery really means:
Developer discovery: The developer hard-codes copious amounts of API details into the client, such as resource URI's, query parameters, supported HTTP methods, and other details that they've discovered through browsing the docs and experimenting with the API's responses. This type of discovery IMHO necessitates cool linkage and the API versioning question, and leads to hard coupling of the client code to the API. Not much better than if using a well-documented collection of RPC's it seems.
Runtime discovery - The client app itself is able to figure out everything it needs with little or no out-of-band information (presumably, only a knowledge of the media types the API deals with.) Links can be hot. But to make the API very efficient, a lot of link templating for query parameters seems to be needed, which makes out-of-band info creep back in. There are possibly other difficulties I haven't thought of yet since I haven't gotten to that point in development. But I do like the idea of loose coupling.
Runtime discovery seems to be the holy grail of REST, but I'm seeing precious little discussion about how to implement such a client. Almost all REST sources I've found seem to assume Developer discovery. Anyone know of some Runtime discovery resources? Best practices? Examples or libraries with real code? I'm working in PHP (Zend Framework) for one client. Objective-C (iOS) for the other.
Is Runtime discovery a realistic goal, given the present set of tools and knowledge in the developer community? I can write my client to treat all of the URI's in an opaque manner, but how to do this most efficiently is a question, especially over low-bandwidth connections. Anyway, URI's are only part of the equation. What about link templating in the Runtime context? How about communicating what methods are supported, aside from making a lot of OPTIONS requests?
This is definitely a tough nut to crack. At Google, we've implemented our Discovery Service that all our new APIs are built against. The TL;DR version is we generate a JSON Schema-like spec that our clients can parse - many of them dynamically.
That results means easier SDK upgrades for the developer and easy/better maintenance for us.
By no means the perfect solution, but many of our devs seem to like.
See link for more details (and make sure to watch the vid.)
Fascinating. What you are describing is basically the HATEOAS principle. What is HATEOAS you ask? Read this: http://en.wikipedia.org/wiki/HATEOAS
In layman's terms, HATEOAS means link following. This approach decouples your client from specific URL's and gives you the flexibility to change your API without breaking anyone.
You did your home work and you got to the heart of it: runtime discovery is holy grail. Don't chase it.
UDDI tells a poignant story of runtime discovery: http://en.wikipedia.org/wiki/Universal_Description_Discovery_and_Integration
One of the requirements that should be satisfied before you can call an API 'RESTful' is that it should be possible to write a generic client application on top of that API. With the generic client, a user should be able to access all the API's functionality. A generic client is a client application that does not assume that any resource has a specific structure beyond the structure that is defined by the media type. For example, a web browser is a generic client that knows how to interpret HTML, including HTML forms etc.
Now, suppose we have a HTTP/JSON API for a web shop and we want to build a HTML/CSS/JavaScript client that gives our customers an excellent user experience. Would it be a realistic option to let that client be a generic client application? No. We want to provide a specific look-and-feel for every specific data element and every specific application state. We don't want to include all knowledge about these presentation-specifics in the API, on the contrary, the client should define the look and feel and the API should only carry the data. This implies that the client has hard-coded coupling of specific resource elements to specific layouts and user interactions.
Is this the end of HATEOAS and thus the end of REST? Yes and no.
Yes, because if we hard-code knowledge about the API into the client, we loose the benefit of HATEOAS: server-side changes may break the client.
No, for two reasons:
Being "RESTful" is a property of the API, not of the client. As long as it is possible, in theory, to build a generic client that offers all capabilities of the API, the API can be called RESTful. The fact that clients don't obey the rules, is not the API's fault. The fact that a generic client would have a lousy user experience is not an issue. Why is it important to know that it is possible to have a generic client, if we don't actually have that generic client? This brings me to the second reason:
A RESTful API offers clients the option to choose how generic they want to be, i.e. how resilient to server-side changes they want to be. Clients which need to provide a great user experience may still be resilient to URI changes, to changes in default values and more. Clients doing batch jobs without user interaction may be resilient to other kinds of changes.
If you are interested in practical examples, checkout my JAREST paper. The last section is about HATEOAS. You will see that with JAREST, even highly interactive and visually attractive clients can be quite resilient to server-side changes, though not 100%.
I think the important point about HATEOAS is not that it is some holy grail client-side, but that it isolates the client from URI changes - it is assumed you are using known (or developer discovered custom) Link Relations that will allow the system to know which link for an object is the editable form. The important point is to use a media type that is hypermedia aware (e.g. HTML, XHTML, etc).
You write:
To make the API very efficient, a lot of link templating for query parameters seems to be needed, which makes out-of-band info creep back in.
If that link template is supplied in the previous request, then there is no out-of-band information. For example a HTML search form uses link templating (/search?q=%#) to generate a URL (/search?q=hateoas), but nothing is known by the client (the web browser) other than how to use HTML forms and GET.

Should a Netflix or Twitter-style web service use REST or SOAP? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I've implemented two REST services: Twitter and Netflix. Both times, I struggled to find the use and logic involved in the decision to expose these services as REST instead of SOAP. I hope somebody can clue me in to what I'm missing and explain why REST was used as the service implementation for services such as these.
Implementing a REST service takes infinitely longer than implementing a SOAP service. Tools exist for all modern languages/frameworks/platforms to read in a WSDL and output proxy classes and clients. Implementing a REST service is done by hand and - get this - by reading documentation. Furthermore, while implementing these two services, you have to make "guesses" as to what will come back across the pipe as there is no real schema or reference document.
Why write a REST service that returns XML anyway? The only difference is that with REST you don't know the types each element/attribute represents - you are on your own to implement it and hope that one day a string doesn't come across in a field you thought was always an int. SOAP defines the data structure using the WSDL so this is a no-brainer.
I've heard the complaint that with SOAP you have the "overhead" of the SOAP Envelope. In this day and age, do we really need to worry about a handful of bytes?
I've heard the argument that with REST you can just pop the URL into the browser and see the data. Sure, if your REST service is using simple or no authentication. The Netflix service, for instance, uses OAuth which requires you to sign things and encode things before you can even submit your request.
Why do we need a "readable" URL for each resource? If we were using a tool to implement the service, do we really care about the actual URL?
A canary in a coal mine.
I have been waiting for a question like this for close to a year now. It was inevitable that this day would come and I am sure we are going to see many more questions like this in the coming months.
The warning signs
You are absolutely correct, it does take longer to build RESTful clients than SOAP clients. The SOAP toolkits take away lots of boilerplate code and make client proxy objects available with almost no effort. With a tool like Visual Studio and a server URL I can be accessing remote objects of arbitrary complexity, locally in under five minutes.
Services that return application/xml and application/json are so annoying for client developers. What are we supposed to do with that blob of data?
Fortunately, lots of sites that provide REST services also provide a bunch of client libraries so that we can use those libraries to get access to a bunch of strongly typed objects. Seems kind of dumb though. If they had used SOAP we could have code-gen’d those proxy classes ourselves.
SOAP overhead, ha. It’s latency that kills. If people are really concerned about the number of excess bytes going across the wire then maybe HTTP is not the right choice. Have you seen how many bytes are used by the user-agent header?
Yeah, have you ever tried using a web browser as debugging tool for anything other than HTML and javascript. Trust me it sucks. You can only use two of the verbs, the caching is constantly getting in the way, the error handling swallows so much information, it’s constantly looking for a goddamn favicon.ico. Just shoot me.
Readable URL. Only nouns, no verbs. Yeah, that’s easy as long as we are only doing CRUD operations and we only need to access a hierarchy of objects in one way. Unfortunately most applications need a wee bit more functionality than that.
The impending disaster
There are a metric boatload of developers currently developing applications that integrate with REST services who are in the process of coming to the same set of conclusions that you have. They were promised simplicity, flexibility, scalability, evolvabilty and the holy grail of serendipitous reuse. The characteristics of the web itself, how can things go wrong.
However, they are finding that versioning is just as much of a problem, but the compiler doesn’t help detect issues. The hand written client code is a pain to maintain as the data structures evolve and URLs get refactored. Designing APIs around just nouns and four verbs can be really hard, especially with RESTful Url zealots telling you when you can and cannot use query strings.
Developers are going to start asking why are we wasting our effort on support both Json formats and Xml formats, why not just focus our efforts on one and do it well?
How did things go so wrong
I’ll tell you what went wrong. We as developers let the marketing departments take advantage of our primary weakness. Our eternal search for the silver bullet blinded us to the reality of what REST really is. On the surface REST seems so easy and simple. Name your resources with Urls and use GET, PUT, POST and DELETE. Hell, us devs already know how to do that, we have been dealing with databases for years that have tables and columns and SQL statements that have SELECT, INSERT, UPDATE and DELETE. It should have been a piece of cake.
There are other parts of REST that some people discuss, such as self-descriptiveness, and the hypermedia constraint, but these constraints are not so simple as resource identification and the uniform interface. The seem to add complexity where the desired goal is simplicity.
This watered down version of REST became validated in developer culture in many ways. Server frameworks were created that encouraged Resource Identification and the uniform interface, but did nothing to support the other constraints. Terms started to float around differentiating the approaches, (HI-REST vs LO-REST, Corporate REST vs Academic REST, REST vs RESTful).
A few people scream out that if you don’t apply all of the constraints it’s not REST. You will not get the benefits. There is no half REST. But those voices were labelled as religious zealots who were upset that their precious term had been stolen from obscurity and made mainstream. Jealous people who try to make REST sound more difficult than it is.
REST, the term, has definitely become mainstream. Almost every major web property that has an API supports "REST". Twitter and Netflix are two very high profile ones. The scary thing is that I can only think of one public API that is self-descriptive and there are a handful that truly implement the hypermedia constraint. Sure some sites like StackOverflow and Gowalla support links in their responses, but there are huge gaping holes in their links. The StackOverflow API has no root page. Imagine how successful the web site would have been if there was no home page for the web site!
You were misled I’m afraid
If you have made it this far, the short answer to your question is those APIs (Netflix and Twitter) do not conform to all of the constraints and therefore you will not get the benefits that REST apis are supposed to bring.
REST clients do take longer to build than SOAP clients but they are not tied to one specific service, so you should be able to re-use them across services. Take the classic example, of a web browser. How many services can a web browser access? What about a Feed Reader? Now how many different services can the average Twitter client access? Yes, just one.
REST clients are not supposed to be built to interface with a single service, they are supposed to be built to handle specific media types that could be served by any service. The obvious question to that is, how can you build a REST client for a service that delivers application/json or application/xml. Well you can’t. That’s because those formats are completely useless to a REST client. You said it yourself,
you have to make "guesses" as to what
will come back across the pipe as
there is no real schema or reference
document
You are absolutely correct for services like Twitter. However, the self-descriptive constraint in REST says that the HTTP content type header should describe exactly the content that is being transmitted across the wire. Delivering application/json and application/xml tells you nothing about the content.
When it comes to considering the performance of REST based systems it is necessary look at the bigger picture. Talking about envelope bytes is like talking about loop unwinding when comparing a quick-sort to a shell-sort. There are scenarios where SOAP can perform better, and there are scenarios where REST can perform better. Context is everything.
REST gains much of its performance advantage by being very flexible about what media types it supports and by having sophisticated support for caching. For caching to work well though nearly all of the constraints must be adhered to.
Your last point about readable urls is by far the most ironic. If you truly commit to the hypermedia constraint, then every URL could be a GUID and the client developer would lose nothing in readability.
The fact that URIs should be opaque to the client is one of the most key things when developing REST systems. Readable URLs are convenient for the server developer and well structured URLs make it easier for the server framework to dispatch requests, but those are implementation details that should have no impact on the developers consuming the API.
The Twitter API is not even close to being RESTful and that is why you are unable to see any benefit to using it over SOAP. The Netflix API is much closer but it’s use of generic media types demonstrates that failing to adhere to even a single constraint can have a profound impact on the benefits derived from the service.
It may not be all their fault
I’ve done a whole lot of dumping on the service providers, but it takes two to dance RESTfully. A service may follow all of the constraints religiously and a client can still easily undo all of the benefits.
If a client hard codes urls to access certain types of resources then it is preventing the server from changing those urls. Any kind URL construction based on implicit knowledge of how the service structures its urls is a violation.
Making assumptions about what type of representation will be returned from a link can lead to problems. Making assumptions about the content of the representation based on knowledge that is not explicitly stated in the HTTP headers is definitely going to create coupling that will cause pain in the future.
Should they have used SOAP?
Personally, I don’t think so. REST done right allows a distributed system to evolve over the long term. If you are building distributed systems that have components that are developed by different people and need to last for many years, then REST is a pretty good option.
SOAP is an object-oriented, remote procedure call technology stack. It works by building a new abstraction on top of an existing protocol (HTTP).
REST is a document oriented approach, that simply uses the features of an existing protocol (HTTP). "REST" is just a buzzword -- the concept is this: Just use the web the way it was designed to work!
In response to edits to question:
"Implementing a REST service takes infinitely longer than implementing a SOAP service."
Um, no, it can't be infinitely longer. And in cases where what you are trying to retrieve is already a document or file, it's actually much faster. For example, the OGC spec for WMS (Web Mapping Service) defines both a SOAP and REST version of the protocol, and there's a reason why almost nobody implements the SOAP version -- it's because if you're trying to get a map, it's a lot easier to just build a URL and fetch image bytes from that URL than it is to bother with encapsulating it into a SOAP message. But yes, I will agree that if the point of the web service is to transfer some strongly-typed object in a domain object model, SOAP is better suited for that use.
"Why write a REST service that returns XML anyway?"
Well, yes, that can be silly. But it depends on what the XML is. If there's a clearly defined schema for it somewhere, then there's no ambiguity. For example, you can think of WSDL URLs as being a kind of RESTful web service for retrieving information about a web service. In this case, adding the overhead of another SOAP request would be pointless.
In general, REST wins when the content that is being transferred can be thought of as a file, as a single unit. SOAP wins when the content needs to be treated as an object with members.
"I've heard the complaint that with SOAP you have the "overhead" of the SOAP Envelope. In this day and age, do we really need to worry about a handful of bytes?"
Yes. Not in every circumstance, but there are sites with a great deal of traffic where it makes a difference. Is it enough of a difference to outweigh the semantic differences of using SOAP instead of REST? I doubt it. If you're doing an object remoting protocol and the number of bytes is making a difference, SOAP is probably not the tool for you anyway -- maybe you should be using CORBA or DCOM instead.
"I've heard the argument that with REST you can just pop the URL into the browser and see the data."
Yes, and this is a large argument in favor of REST if it makes sense to view the data in a browser. For example, with image data, it's an easy way to debug the service -- just paste the URL into your browser's address bar and see what the image looks like. Or if the data returned is in XML, and you have a referenced XML stylesheet that renders into readable HTML in the browser, then you get the benefit of semantic markup and easy visualization all in one package. But you are correct, this benefit mostly evaporates when working with more complex authentication schemes. If you can't encode all your authentication information into each HTTP request, then I would argue that it doesn't count as REST at all.
"Why do we need a "readable" URL for each resource? If we were using a tool to implement the service, do we really care about the actual URL?"
Well, it depends. Why do we need readable URLs for any resource on the web? You can read Tim Berners-Lee's essay Cool URIs Don't Change for the rationale, but basically, as long as the resource may still be useful in the future, the URI for that resource should stay the same.
Obviously, for transient resources (like the "today's Money" link in the essay) there is no need for it, since the need to reference the resource goes away if the corresponding resource goes away. But for more permanent resources (like StackOverflow questions, for example, or movies on IMDB), you want to have a URL that will work forever. When you're designing a web service, you need to decide if the resources themselves could outlive your service, and if so, then REST is probably the right way to go.
For the record, yes, I've been developing web pages since well before NetFlix or Twitter existed. And no, I've not yet had any need or opportunity to implement a client to either NetFlix or Twitter's services. But even if their services are atrociously difficult to work with, that doesn't mean the technology they implemented their services on top of is bad -- only that those two implementations are bad.
To make a long story short: REST and SOAP are just tools. They each have strengths and weaknesses. If the only tool you have is a hammer, then every problem looks like a nail. So get to know both tools, and learn how to use them correctly, and then choose the right tool for each job.
An honest question deserves an honest answer. But first, why did you use the text of this question as an answer to another question if you did not think it was rhetorical in nature?
Anyway:
"Tools exist for all modern languages/frameworks/platforms to read in a WSDL and output proxy classes and clients. Implementing a REST service is done by hand by reading documentation."
Just like browser vendors have read and re-read the HTML 4.01 specification up and down to try to implement a consistent browsing experience. Have you reflected on the fact that browsers were invented long before internet banking and stackoverflow, and yet, you can use a browser to do just those things. This is made possible because of the sole reason that everybody agrees to use HTML (and related formats like CSS, JS, JPEG etc).
Blogging is actually not that new, and someone came up with AtomPub, which allows any blogging software to access and update posts in a blog, much like any web browser can access any web page. That's pretty neat, and works because of the RESTful constraints imposed by the protocol.
But for Twitter and Netflix, there is no universal agreement that "all microblogs in existence shall use the media type application/tweet", mainly because microblogging is so new. Maybe in a few years time a few microblogging services settle on the same API so that Twitter, Facebook, Identica and can interoperate. None of their existing APIs are anywhere near RESTful, however much they claim, so I don't expect it to happen real soon.
"Furthermore, while implementing these two services, you have to make "guesses" as to what will come back across the pipe as there is no real schema or reference document."
You've hit the nail on the head. REST is all about distributed and hypermedia, and that pretty much sums it up. A browser looks at what it gets from a request and shows it to the user. A HTML page usually spawns a lot more GET requests, for example CSS, scripts and images. An image is typically only rendered to the screen, JavaScript is executed, and so on. Each time, the browser does what it does because it found the link in an <img> or <style> tag and the response media type was image/jpeg or text/css.
If Twitter makes a hypermedia based API, it will probably always return an application/tweet every time you follow a link to a tweet, but the client should never assume it, and always check what it gets before acting on it.
"Why write a REST service that returns XML anyway?"
This all boils down to media types. Like HTML, if you see an element that you've no idea what actually means, the HTML spec instructs you to ignore them, and process the "body" of the tag if it has one. Likewise, the atom spec instructs you to ignore unknown elements and foreign markup (from different namespaces) and not process the body (IIRC).
Designing media types for generic problem domains (as in the HTML media type for the rich text problem domain) is very hard. Making media types for very narrow problem domains is probably a lot easier (like a tweet). But it's always a good idea to design for extensibility and specify how clients (and servers) are supposed to react when they see elements or data items that don't match the spec. JPEG, for example has an Application-specific record type (e.g. APP1) which is used to contain all sorts of meta data.
"I've heard the complaint that with SOAP you have the "overhead" of the SOAP Envelope. In this day and age, do we really need to worry about a handful of bytes?"
No, we don't. REST is absolutely not about being efficient over the wire, it's actually trading wire efficiency in. REST's efficiency comes from the possibilities of caching enabled by all the other constraints: Fielding's dissertation notes: The trade-off, though, is that a uniform interface degrades efficiency, since information is transferred in a standardized form rather than one which is specific to an application's needs. The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction. I don't think that the SOAP Envelope byte count overhead is a valid concern.
"I've heard the argument that with REST you can just pop the URL into the browser and see the data."
Yes, that's also an invalid argument. It doesn't work that way. Even if it did work, most narrow REST APIs out there use media types that browsers have no idea about and it still won't work.
But there are a lot more possibilities than a browser to test a HTTP based API, like command line utilities or browser extensions that allow you to control almost any aspect of a HTTP request, inspect response headers and discover links for you to follow. But even so, this is nowhere near as easy as generating WSDL stubs and making a three line program to call the function anyway.
"Why do we need a "readable" URL for each resource? If we were using a tool to implement the service, do we really care about the actual URL?"
If you look at how the web works, I'm pretty sure that humans are by and large glad that the URI for a wikipedia page looks like this, http://en.wikipedia.org/wiki/Stack_overflow instead of http://en.wikipedia.org/wiki/?oldid=376349090. But it actually is not important to REST. The important thing to try to get right is to choose to place relevant data in the URI that is not likely to change. You might think that the database ID will never change, but what happens when two data sets need to be merged? All your primary keys change. The page title (Stack_overflow) will not change.
Sorry for the long response, but I believe this question is valid, and hasn't been addressed before here on SO. I'm sure Darrel Miller will add his answer once he's back too.
Edit: formatting
Martin Fowler has a post on the Richardson Maturity Model which does a great job explaining the difference between SOAP and REST.
WSDL and other document level protocols are redundant. The HTTP protocol supports a much richer set of operations besides just serving documents and submitting forms.
Supporters of REST are uncomfortable with that redundancy.

SOAP - What's the point?

I mean, really, what is the point of SOAP?
Web services have been around for a while, and for a while it seemed that the terms 'SOAP' and 'Web service' were largely interchangeable. However SOAP always seemed unwieldy and massively overcomplicated to me.
Then REST came along, and suddenly web services made sense.
As Joel Spolsky says, give a programmer a REST URL, and they can start playing with the service right away, figuring it out.
SOAP is obfuscated behind WSDLs and massively verbose XML, and despite being web based, you can't do anything as simple as access a SOAP service with a web browser.
So the essence of my question is:
Are there any good reasons to ever choose SOAP over REST?
Are you working with SOAP now? Would it be better if the interface was REST?
Am I wrong?
As Joel Spolsky says, give a programmer a REST URL, and they can start playing with the service right away, figuring it out.
Whereas if the service had a well specified, machine readable contract, then the programmer wouldn't have to waste any time figuring it out.
(not that WSDL/SOAP is necessarily an example of good implementation of a well specified contract, but that was the point of WSDL)
Originally, SOAP was a simple protocol which allowed you to add a header to a message, and had a standardized mapping of object instances to XML structures. Putting the handling metadata in the message simplified the client code, and meant you could very simply persist and queue messages.
I never needed the header processing details when I built SOAP services back in 2001. This was pre-WSDL, and it was then normal to use GET for getting information and queries (no different to most applications which claim to be REST; REST has more in terms of using hyperlinks for service discovery) and POST with a SOAP payload to perform actions. Those actions which created resources would return the URL of the created resource to the client, and the client could then GET the resource. I think it's the fact that WSDL made it easy to think only in terms of RPC rather than actions which create resources which made SOAP lose the plot.
The way I see it, SOAP might be more "flexible", but as a result it's just way too complicated (you mentioned the WSDL, which is always a stumbling block to me personally).
I get REST. It's simple. The only downside I might see is that you are limiting yourself to those 4 basic actions against a single resource, which might not exactly fit the way you view your data.
The topic is well-discussed in Why is soap considered to be thick.
While doing some research to understand some of the answers here (especially John Saunders') I found this post http://harmful.cat-v.org/software/xml/soap/simple
SOAP is more insane than I thought...
The point of WSDL was auto-discovery. The idea was that you wouldn't have to write client code, it would be auto-generated.
BTW. next step beyond WSDL are Semantic Web Services.
If you don't need the features of the WS-* series of protocols; if you don't need self-describing services; if your service cannot be completely described as resources, as defined by the HTTP protocol; if you don't like having to author XML for every interaction with the service, and parse it afterwards; then you need SOAP.
Otherwise, sure, use REST.
There's been some question about the value of a self-describing service. My imagination fails me when it comes to imagining how anyone could fail to understand this. That's on me. Still, I have to think that anyone who has ever used a service much more complicated than "Hello, world" would know why it is valuable to have someone else write the code that accepts parameters, creates the XML to send to the service, sends it, receives the response, then turns that back into objects.
Now, I suppose this might not be necessary when using a RESTful service; at least not with a RESTful service that does not process complex objects. Even with a relatively simple service like http://www.earthtools.org/webservices.htm (which I've used as an example of calling a RESTful service), one benefits from understanding the structure of the returned data. Even the above service provides an XML Schema - it unfortunately doesn't describe the entire response. Given that schema one still has to manually process the XML, or else use a tool to produce serializable classes from the schema.
All of this happens for you when the service is described in a WSDL, and you use a tool like "Add Service Reference" in Visual Studio, or the svcutil.exe program, or I-forget-what-the-command-is-in-Eclipse.
If you want examples, start with the EarthTools services, and go on to any other services with more complicated messaging.
BTW, another thing that requires self-description is description of the messaging patterns and protocols supported by the service. Perhaps that's not required when the only choices are HTTP verbs over HTTP or HTTPS. Life gets more complicated if you're using WS-Security and friends.
I find that SOAP fits in most appropriately when there is a high probability that a service will be consumed by corporate off the shelf (COTS) software. Because of the well specified contract employed by SOAP/WSDL most COTS packages have built in functionality for consuming such services. This can make it easy for BPM/workflow tools etc. to simply consume defined services without customization. Beyond that service use case REST tends to be my goto web service implementation for applications.
Well it appears now that the WSI agree that SOAP no longer has a point as they have announced they will cease to exist as an independent entity.
Interesting article about the announcement and some commentary here: http://blogs.computerworlduk.com/simon-says/2010/11/the-end-of-the-road-for-web-services/index.htm
Edited to be completely accurate in response to John Saunders.
I think SOAP appeals to the Java and .net crowd who may be more familiar with the old CORBA and COM and less familiar with internet technologies.
REST also has one major drawback: there is very little guidance on how to actually implement such a system. You will find significant variations on how many of the public RESTful APIs have been designed. In fact many violate key aspects of REST (such as using GET for manipulation or POST for retrieval) and there are disagreements over fundamental usage (POST/GET vs POST/GET/PUT/DELETE).
Am I wrong?
"You're not wrong, Walter, you're just... :)"
Are there any good reasons to ever choose SOAP over REST?
SOAP, to my understanding adheres to a contract, thus can be type checked.
SOAP is a lightweight XML based structured protocol specification to be used in the implementation of services . It is used for exchanging
structured information in a decentralized, distributed environment. SOAP uses XML technologies for exchanging of information over any transport layer protocol.
It is independent of any particular programming model and other implementation specific semantics. Learn More about XML
SOAP Messaging Framework
XML-based messaging framework that is
1) Extensible : Simplicity remains one of SOAP's primary design goals. SOAP defines a communication framework that allows for features such as security, routing, and
reliability to be added later as layered extensions
2) Inter operable : SOAP can be used over any transport protocol such as TCP, HTTP, SMTP. SOAP provides an explicit binding today for HTTP.
3) Independent : SOAP allows for any programming model and is not tied to Remote procedure call(RPC). SOAP defines a model for processing individual, one-way messages.
SOAP also allows for any number of message exchange patterns (MEPs) .Learn more about SOAP