Related
If I would have a resource on a certain URI, like https://api.example.com/things/my-things and so far this resource may be displayed on the following representations:
application/xml
application/xhtml+xml
text/xml
text/html
How SHOULD the server inform the clients asking for application/xml, application/xhtml+xml and text/xml that those are going to stop being supported as representation of the resource? Not right now, so a 406 Not Acceptable is not adequate.
I found a internet draft The Deprecation HTTP Header Field, but it is a internet draft, not a RFC, and I am not sure if this would be a valid implementation of the specification, or it would mean that the resource/URI is the one actually being deprecated.
Does anyone know a authoritative way to express that a representation of a resource is being deprecated, and is going to reach its sunset, but the URI would still be available with a different set of representations available?
Ultimately, the information that you want to inform the client of is one of API or application policy. There really isn't any standard way to convey this information via HTTP; at least, not today. Unless your clients are savvy, even if you did provide this information, they'd likely ignore it and you're back to 406 or 415.
The best standard way I can think of to negotiate this would require the client to send HEAD first with Accept or Content-Type and then the server responds with OK if allowed or the appropriate 406 or 415. HTTP caching and/or other techniques can be used to minimize the number of negotiations, but in the worst case scenario, there is always two requests.
The next best way would arguably be through policy enforced with API versioning. Although the differences in version would only be by representation, all facets are clearly separated. If API version 1.0 supports application/xml it should stay that way - forever. This provides:
Stability and predictability for clients
Should allow you to easily identify clients on the old API (and possibly notify them)
Keeps things simple on the server
There are also few ways to loosely advertise that a particular API version is being deprecated. You could use standard headers such as pragma or warning, or you can use something like api-deprecated-versions: 1.0, 1.1. This approach still requires a client to pay attention to these response headers and may not necessarily indicate when the API will transition from deprecated to completely sunset. Most mature server API policies would have a deprecation period of 6+ months, but this is by no means a hard and fast rule; you'd have to establish that will your clients. What this approach can do is enable telemetry owned by clients to detect that an API (and/or version) they are using is deprecated. This should alert client developers to determine the next course of action; for example, upgrade their client.
Depending on your versioning semantics, if they even exist, you likely can achieve a similar albeit more optimal approach using OPTIONS. There isn't an Allow-Content-Type complement to Allow, but you could certainly define a custom one. You might also simply report api-supported-versions and api-deprecated-versions this way. This would enable tooling and clients to select or detect an appropriate endpoint and/or media type. A client might use this approach each time its application starts up to detect and record whether the endpoint they are using is still up-to-date.
A final suggestion could be to advertise this information by way of an OpenAPI (formerly Swagger) document. Such a document would indicate the available URLs, parameters, and media types. A client could request the appropriate document to determine whether their API and expected media type are supported.
Hopefully that gives you a few ideas. First, you need to define a policy and decide how that will be conveyed. You'll then need to educate your clients on how to take advantage of that information and capability. If they opt not to honor that information, then caveat emptor - they'll just get 406, 415, or some other appropriate error response.
There is no authoritative (normative) way of doing this now.
When I first sought to answer this question, it was in my head to suggest adding a header, and lo, that's been proposed.
The Deprecation HTTP Header Field you refer to appears to become that normative way of doing that.
It's also the simplest way to inform a client without the added complexity of other options. This way of informing the client means the client can 100% expect the API to behave the way it always has during the deprecation period, which is often critical.
Often resource, representation, and "representation of resource" can mean the same or different things depending on who you talk to. I would say that pragmatically from the client's perspective, they're the same thing, and so a header is a reasonable method of informing about deprecation.
I'm aware of this Q/A. The answer doesn't help my situation much.
I'm designing a CQRS-based RESTful service that provides the current reputation of existing accounts. The reputation calculation can take up to 30 seconds.
Due to the long calculation time, and due to our desire to use CQRS on this for other reasons, the entire process is done in two (or more) requests:
FIRST REQUEST
A client tells the RESTful service to begin the calculation for the account.
The RESTful service basically responds "OK" and has another service start the calculation.
SUBSEQUENT REQUESTS
The client asks the RESTful service for the calculated reputation score, if available.
The RESTful service responds with either the reputation score, or a note that it's still calculating.
QUESTION #1
How should I structure the first request URI? For account number 12345, should it be something like one of the following?
PUT /accounts/12345 payload: {}
I'm concerned about this approach because I read PUT should be idempotent.
Another option:
POST /accounts/12345 payload: {} // ...but shouldn't POST contain the whole entity in the payload?
...or, maybe change the entity from an account to a command...
POST /command/reputation-calculation/12345 payload: {} // ...feels like we're getting off-course here
...or something else?
QUESTION #2
For the second request, that seems a bit more straightforward. Should the URI be something like this?
GET /accounts/12345/reputations
I appreciate your suggestions. Thanks.
I may have found an answer. It involves moving the CQRS away from the client, and into the control of the RESTful service, which may employ it optionally.
For each client request, the URI could be:
GET /accounts/12345/reputations
Upon receipt, the RESTful service could check to see if a recently-calculated reputation is available. If a recent reputation is available, the RESTful service replies with a 200 OK status and delivers the response payload containing the reputation.
If no recent reputation is available (nor is it in-process according to the calculating service), the RESTful service kicks into CQRS mode. It tells the calculating service to begin calculating the reputation.
Then, whether it initiated the calculation, or whether it found one already in process, it returns to the client a 202 Accepted with follow-up link.
This kind of async situation seems to be what was intended with the 202 Accepted response, per the docs.
REST is not a great fit for CQRS command-oriented systems. So if the client is command-oriented then just use HTTP as the transport.
Alternatively, the client could create a reputation-enquiry resource that has a status of calculating or ready. The back-end decides if the result of a recent calculation can be reused, in which case the status will be immediately ready. A score is supplied when the resource is ready.
How should I structure the first request URI?
The spelling of the URI doesn't matter -- the machines don't care very much. If you think about how generic clients cache documents, you might choose to use a target-uri specifically to invalidate one specific resource in the cache.
Using POST is a natural fit with the way HTTP caching works. Also, it's how we would do it in HTML, so you know it has a track record.
So if you do GET /accounts/12345 to view the current reputation score, then POST /accounts/12345 is a perfectly reasonable way to kick off the upgrade process.
If you were using hypermedia (which is one of the REST constraints) then the response you get from the POST would include in it the URI for the status monitor. So that URI could be anything you want (since the client just goes where you tell it to). The server will probably want to get a hint back to know which update you are asking about; so it could be GET /reputationUpdate/67890.
There might be advantages to having the status monitor in the same hierarchy as the account, so GET /accounts/12345/reputationUpdate/67890 would also be fine.
Spellings that are consistent with the web linking specification often make implementing the client or server easier, because you can grab a library off the shelf that understands templates.
Suppose that the server restricts a JSON field to an enumerated set of values.
e.g. a POST request to /user expects an object with a field called gender that should only be "male", "female" or "n/a".
Should a wrapper library make sure that the field is set correctly before making the request?
Pro: Makes it possible for the client to quickly reject input that would otherwise require a roundtrip to the server. In some cases this would allow for a much better UX.
Con: You have to keep the libary in sync with the backend, otherwise you could reject some valid input.
With a decent type system you should encode this particular restriction in the library API anyway. I think usually people validate at least basic stuff on the client and let server do further validation, like things that can’t be verified on the client at all.
This is a design choice - the enum type constraint should be documented in the public API of the server and it's part of its contract.
Clients are forced to obey the contract to make a successful request, but are not required to implement the validation logic. You can safely let the clients fail with "Bad Request" or other 4xx error.
Implementing the validation logic on both sides couples the client and the server - any changes to the validation logic should be implemented on both sides.
If the validation logic is something closer to common sense (e.g. this field should not be empty) it can safely be implemented on both sides.
If the validation logic is something more domain specific, I think it should be kept on the backend side only.
You have to think about the same trade-offs with a wrapping library (which can be looked at as a client of the server API). It depends on what the role of the wrapping library is - if the wrapping library should expose the full API contract of the server - than by all means the validation logic can be duplicated in the wrapping lib - other wise I would keep it to the backend.
The wrapper-library is the actual client of the REST api and hence has to adhere to both the architectural and protocol imposed constraints. In his blog post Fielding explained some of the constraints even further. One of them are typed resources which states that clients shouldn't assume the API to return a specific type, i.e. some user details in JSON. This is what media-types and content negotiation are actually for.
The definition of a media type may give clients a hint on how to process the data received i.e. like with the JSON or XML based vCard format. As media types define the actual format of some specific document it may contain processing rules like pre-validation requirements or syntax regulations i.e. through XML schema or JSON schema validation.
One of the basic rules in remote computing is though to never trust inputs received and hence the server should validate the results regardless if the client has done a pre-validation before. Due to the typed resource constraint a true RESTful client will check if the received media type does support pre-validation through its spec and only apply pre-validation if the spec is defining it and also mentions some mechanisms on how to perform it (i.e. through certain schema mechanism).
My personal opinion on this is that if you try to follow the REST architectural approach you shouldn't validate the input unless the media type explicitely supports it. As a client will learn through error responses which fields and values a certain REST endpoint expects and the server hopefully validates the inputs anyway I don't see the necessity to validate it on the client side. As performance considerations are often more important than following the rules and recommendations it is though up to you. Note however, that this may or may not couple the client to the server and hence increase the risk of breaking on server changes more easily. As REST is not a protocol but a design suggestion, it is up to you which route you prefer.
TL;DR : scroll down to the last paragraph.
There is a lot of talk about best practices when defining RESTful APIs: what HTTP methods to support, which HTTP method to use in each case, which HTTP status code to return, when to pass parameters in the query string vs. in the path vs. in the content body vs. in the headers, how to do versioning, result set limiting, pagination, etc.
If you are already determined to make use of best practices, there are lots of questions and answers out there about what is the best practice for doing any given thing. Unfortunately, there appears to be no question (nor answer) as to why use best practices in the first place.
Most of the best practice guidelines direct developers to follow the principle of least surprise, which, under normal circumstances, would be a good enough reason to follow them. Unfortunately, REST-over-HTTP is a capricious standard, the best practices of which are impossible to implement without becoming intimately involved with it, and the drawback of intimate involvement is that you tend to end up with your application being very tightly bound to a particular transport mechanism. So, some people (like me) are debating whether the benefit of "least surprise" justifies the drawback of littering the application with REST-over-HTTP concerns.
A different approach examined as an alternative to best practices suggests that our involvement with HTTP should be limited to the bare minimum necessary in order to get an application-defined payload from point A to point B. According to this approach, you only use a single REST entry point URL in your entire application, you never use any HTTP method other than HTTP POST, never return any HTTP status code other than HTTP 200 OK, and never pass any parameter in any way other than within the application-specific payload of the request. The request will either fail to be delivered, in which case it is the responsibility of the web server to return an "HTTP 404 Not Found" to the client, or it will be successfully delivered, in which case the delivery of the request was "HTTP 200 OK" as far as the transport protocol is concerned, and anything else that might go wrong from that point on is exclusively an application concern, and none of the transport protocol's business. Obviously, this approach is kind of like saying "let me show you where to stick your best practices".
Now, there are other voices that say that things are not that simple, and that if you do not follow the RESTful best practices, things will break.
The story goes that for example, in the event of unauthorized access, you should return an actual "HTTP 401 Unauthorized" (instead of a successful response containing a json-serialized UnauthorizedException) because upon receiving the 401, the browser will prompt the user of credentials. Of course this does not really hold any water, because REST requests are not issued by browsers being used by human users.
Another, more sophisticated way the story goes is that usually, between the client and the server exist proxies, and these proxies inspect HTTP requests and responses, and try to make sense out of them, so as to handle different requests differently. For example, they say, somewhere between the client and the server there may be a caching proxy, which may treat all requests to the exact same URL as identical and therefore cacheable. So, path parameters are necessary to differentiate between different resources, otherwise the caching proxy might only ever forward a request to the server once, and return cached responses to all clients thereafter. Furthermore, this caching proxy may need to know that a certain request-response exchange resulted in a failure due to a particular error such as "Permission Denied", so as to again not cache the response, otherwise a request resulting in a temporary error may be answered with a cached error response forever.
So, my questions are:
Besides "familiarity" and "least surprise", what other good reasons are there for following REST best practices? Are these concerns about proxies real? Are caching proxies really so dumb as to cache REST responses? Is it hard to configure the proxies to behave in less dumb ways? Are there drawbacks in configuring the proxies to behave in less dumb ways?
It's worth considering that what you're suggesting is the way that HTTP APIs used to be designed for a good 15 years or so. API designers are tending to move away from that approach these days. They really do have their reasons.
Some points to consider if you want to avoid using ReST over HTTP:
ReST over HTTP is an efficient use of the HTTP/S transport mechanism. Avoiding the ReST paradigm runs the risk of every request / response being wrapped in verbose envelopes. SOAP is an example of this.
ReST encourages client and server decoupling by putting application semantics into standard mechanisms - HTTP and XML/JSON (or others data formats). These protocols and standards are well supported by standard libraries and have been built up over years of experience. Sure, you can create your own 'unauthorized' response body with a 200 status code, but ReST frameworks just make it unnecessary so why bother?
ReST is a design approach which encourages a view of your distributed system which focuses on data rather than functionality, and this has a proven a useful mechanism for building distributed systems. Avoiding ReST runs the risk of focusing on very RPC-like mechanisms which have some risks of their own:
they can become very fine-grained and 'chatty'
which can be an inefficient use of network bandwidth
which can tightly couple client and server, through introducing stateful-ness and temporal coupling beteween requests.
and can be difficult to scale horizontally
Note: there are times when an RPC approach is actually a better way of breaking down a distributed system than a resource-oriented approach, but they tend to be the exceptions rather than the rule.
existing tools for developers make debugging / investigations of ReSTful APIs easier. It's easy to use a browser to do a simple GET, for example. And tools such as Postman or RestClient already exist for more complex ReST-style queries. In extreme situations tcpdump is very useful, as are browser debugging tools such as firebug. If every API call has application layer semantics built on top of HTTP (e.g. special response types for particular error situations) then you immediately lose some value from some of this tooling. Building SOAP envelopes in PostMan is a pain. As is reading SOAP response envelopes.
network infrastructure around caching really can be as dumb as you're asking. It's possible to get around this but you really do have to think about it and it will inevitably involve increased network traffic in some situations where it's unnecessary. And caching responses for repeated queries is one way in which APIs scale out, so you'll likely need to 'solve' the problem yourself (i.e. reinvent the wheel) of how to cache repeated queries.
Having said all that, if you want to look into a pure message-passing design for your distributed system rather than a ReSTful one, why consider HTTP at all? Why not simply use some message-oriented middleware (e.g. RabbitMQ) to build your application, possibly with some sort of HTTP bridge somewhere for Internet-based clients? Using HTTP as a pure transport mechanism involving a simple 'message accepted / not accepted' semantics seems overkill.
REST is intended for long-lived network-based applications that span multiple organizations. If you don’t see a need for the constraints, then don’t use them. -- Roy T Fielding
Unfortunately, there appears to be no question (nor answer) as to why use best practices in the first place.
When in doubt, go back to the source
Fielding's dissertation really does quite a good job at explaining how the REST architectural constraints ensure that you don't destroy the properties those constraints are designed to protect.
Keep in mind - before the web (which is the reference application for REST), "web scale" wasn't a thing; the notion of a generic client (the browers) that could discover and consume thousands of customized applications (provided by web servers) had not previously been realized.
According to this approach, you only use a single REST entry point URL in your entire application, you never use any HTTP method other than HTTP POST, never return any HTTP status code other than HTTP 200 OK, and never pass any parameter in any way other than within the application-specific payload of the request.
Yup - that's a thing, it's called RPC; you are effectively taking the web, and stripping it down to a bare message transport application that just happens to tunnel through port 80.
In doing so, you have stripped away the uniform interface -- you've lost the ability to use commodity parts in your deployment, because nobody can participate in the conversation unless they share the same interpretation of the message data.
Note: that's doesn't at all imply that RPC is "broken"; architecture is about tradeoffs. The RPC approach gives up some of the value derived from the properties guarded by REST, but that doesn't mean it doesn't pick up value somewhere else. Horses for courses.
Besides "familiarity" and "least surprise", what other good reasons are there for following REST best practices?
Cheap scaling of reads - as your offering becomes more popular, you can service more clients by installing a farm of commodity reverse-proxies that will serve cached representations where available, and only put load on the server when no fresh representation is available.
Prefetching - if you are adhering to the safety provisions of the interface, agents (and intermediaries) know that they can download representations at their own discretion without concern that the operators will be liable for loss of capital. AKA - your resources can be crawled (and cached)
Similarly, use of idempotent methods (where appropriate) communicates to agents (and intermediaries) that retrying the send of an unacknowledged message causes no harm (for instance, in the event of a network outage).
Independent innovation of clients and servers, especially cross organizations. Mosaic is a museum piece, Netscape vanished long ago, but the web is still going strong.
Of course this does not really hold any water, because REST requests are not issued by browsers being used by human users.
Of course they are -- where do you think you are reading this answer?
So far, REST works really well at exposing capabilities to human agents; which is to say that the server side is so ubiquitous at this point that we hardly think about it any more. The notion that you -- the human operator -- can use the same application to order pizza, run diagnostics on your house, and remote start your car is as normal as air.
But you are absolutely right that replacing the human still seems a long ways off; there are various standards and media types for communicating semantic content of data -- the automated client can look at markup, identify a phone number element, and provide a customized array of menu options from it -- but building into agents the sorts of fuzzy intelligence needed to align offered capabilities with goals, or to recover from error conditions, seems to be a ways off.
What is the value of RESTful “methods” (ie. GET, POST, PUT, DELETE, PATCH, etc.)?
Why not just make every client use the “GET” method w/ any/all relevant params, headers, requestbodies, JSON,etc. etc.?
On the server side, the response to each method is custom & independently coded!
For example, what difference does it make to issue a database query via GET instead of POST?
I understand that GET is for queries that don’t change the DB (or anything else?).
And POST is for calls that do make changes.
But, near as I can tell, the RESTful standard doesn’t prevent one to code up a server response to GET and issue a stored procedure call that indeed DOES change the DB.
Vice versa… the RESTful standard doesn’t prevent one to code up a server response to POST and issue a stored procedure call that indeed does NOT change the ANYTHING!
I’m not arguing that a midtier (HTTP) “RESTlike” layer is necessary. It clearly is.
Let's say I'm wrong (and I may be). Isn't it still likely that there are numerous REST servers violating the proper use of these protocols suffering ZERO repercussions?
The following do not directly address my questions but merely dance uncomfortably around it like an acidhead stoner at a Dead concert:
Different Models for RESTful GET and POST
RESTful - GET or POST - what to do?
GET vs POST in REST Web Service
PUT vs POST in REST
I just spent ~80 hours trying to communicate a PATCH to my REST server (older Android Java doesn't recognize the newer PATCH so I had to issue a stupid kluge HTTP-OVERIDE-METHOD in the header). A POST would have worked fine but the sysop wouldn't budge because he respects REST.
I just don’t understand why to bother with each individual method. They don't seem to have much impact on Idempotence. They seem to be mere guidelines. And if you "violate" these "guidelines" they give someone else a chance to point a feckless finger at you. But so what?
Aren't these guidelines more trouble than they're worth?
I'm just confused. Please excuse the stridency of my post.
Aren’t REST GET/POST/etc. methods superfluous?
What is the value of RESTful “methods” (ie. GET, POST, PUT, DELETE, PATCH, etc.)?
First, a clarification. Those aren't RESTful methods; those are HTTP methods. The web is a reference implementation (for the most part) of the REST architectural style.
Which means that the authoritative answers to your questions are documented in the HTTP specification.
But, near as I can tell, the RESTful standard doesn’t prevent one to code up a server response to GET and issue a stored procedure call that indeed DOES change the DB.
The HTTP specification designates certain methods as being safe. Casually, this designates that a method is read only; the client is not responsible for any side effects that may occur on the server.
The purpose of distinguishing between safe and unsafe methods is to allow automated retrieval processes (spiders) and cache performance optimization (pre-fetching) to work without fear of causing harm.
But you are right, the HTTP standard doesn't prevent you from changing your database in response to a GET request. In fact, it even calls out specifically a case where you may choose to do that:
a safe request initiated by selecting an advertisement on the Web will often have the side effect of charging an advertising account.
The HTTP specification also designates certain methods as being idempotent
Of the request methods defined by this specification, PUT, DELETE, and safe request methods are idempotent.
The motivation for having idempotent methods? Unreliable networks
Idempotent methods are distinguished because the request can be repeated automatically if a communication failure occurs before the client is able to read the server's response.
Note that the client here might not be the user agent, but an intermediary component (like a reverse proxy) participating in the conversation.
Thus, if I'm writing a user agent, or a component, that needs to talk to your server, and your server conforms to the definition of methods in the HTTP specification, then I don't need to know anything about your application protocol to know how to correctly handle lost messages when the method is GET, PUT, or DELETE.
On the other hand, POST doesn't tell me anything, and since the unacknowledged message may still be on its way to you, it is dangerous to send a duplicate copy of the message.
Isn't it still likely that there are numerous REST servers violating the proper use of these protocols suffering ZERO repercussions?
Absolutely -- remember, the reference implementation of hypermedia is HTML, and HTML doesn't include support PUT or DELETE. If you want to afford a hypermedia control that invokes an unsafe operation, while still conforming to the HTTP and HTML standards, the POST is your only option.
Aren't these guidelines more trouble than they're worth?
Not really? They offer real value in reliability, and the extra complexity they add to the mix is pretty minimal.
I just don’t understand why to bother with each individual method. They don't seem to have much impact on idempotence.
They don't impact it, they communicate it.
The server already knows which of its resources are idempotent receivers. It's the client and the intermediary components that need that information. The HTTP specification gives you the ability to communicate that information for free to any other compliant component.
Using the maximally appropriate method for each request means that you can deploy your solution into a topology of commodity components, and it just works.
Alternatively, you can give up reliable messaging. Or you can write a bunch of custom code in your components to tell them explicitly which of your endpoints are idempotent receivers.
POST vs PATCH
Same song, different verse. If a resource supports OPTIONS, GET, and PATCH, then I can discover everything I need to know to execute a partial update, and I can do so using the same commodity implementation I use everywhere else.
Achieving the same result with POST is a whole lot more work. For instance, you need some mechanism for communicating to the client that POST has partial update semantics, and what media-types are accepted when patching a specific resource.
What do I lose by making each call on the client GET and the server honoring such just by paying attention to the request and not the method?
Conforming user-agents are allowed to assume that GET is safe. If you have side effects (writes) on endpoints accessible via GET, then the agent is allowed to pre-fetch the endpoint as an optimization -- the side effects start firing even though nobody expects it.
If the endpoint isn't an idempotent receiver, then you have to consider that the GET calls can happen more than once.
Furthermore, the user agent and intermediary components are allowed to make assumptions about caching -- requests that you expect to get all the way through to the server don't, because conforming components along the way are permitted to server replies out of their own cache.
To ice the cake, you are introducing another additional risk; undefined behavior.
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
Where I believe you are coming from, though I'm not certain, is more of an RPC point of view. Client sends a message, server responds; so long as both participants in the conversation have a common understanding of the semantics of the message, does it matter if the text in the message says "GET" or "POST" or "PATCH"? Of course not.
RPC is a fantastic choice when it fits the problem you are trying to solve.
But...
RPC at web scale is hard. Can your team deliver that? can your team deliver with cost effectiveness?
On the other hand, HTTP at scale is comparatively simple; there's an enormous ecosystem of goodies, using scalable architectures, that are stable, tested, well understood, and inexpensive. The tires are well and truly kicked.
You and your team hardly have to do anything; a bit of block and tackle to comply with the HTTP standards, and from that point on you can concentrate on delivering business value while you fall into the pit of success.