Should a REST API wrapper validate inputs before making a request?

Should a REST API wrapper validate inputs before making a request? - rest

Suppose that the server restricts a JSON field to an enumerated set of values.
e.g. a POST request to /user expects an object with a field called gender that should only be "male", "female" or "n/a".
Should a wrapper library make sure that the field is set correctly before making the request?

Pro: Makes it possible for the client to quickly reject input that would otherwise require a roundtrip to the server. In some cases this would allow for a much better UX.
Con: You have to keep the libary in sync with the backend, otherwise you could reject some valid input.
With a decent type system you should encode this particular restriction in the library API anyway. I think usually people validate at least basic stuff on the client and let server do further validation, like things that can’t be verified on the client at all.

This is a design choice - the enum type constraint should be documented in the public API of the server and it's part of its contract.
Clients are forced to obey the contract to make a successful request, but are not required to implement the validation logic. You can safely let the clients fail with "Bad Request" or other 4xx error.
Implementing the validation logic on both sides couples the client and the server - any changes to the validation logic should be implemented on both sides.
If the validation logic is something closer to common sense (e.g. this field should not be empty) it can safely be implemented on both sides.
If the validation logic is something more domain specific, I think it should be kept on the backend side only.
You have to think about the same trade-offs with a wrapping library (which can be looked at as a client of the server API). It depends on what the role of the wrapping library is - if the wrapping library should expose the full API contract of the server - than by all means the validation logic can be duplicated in the wrapping lib - other wise I would keep it to the backend.

The wrapper-library is the actual client of the REST api and hence has to adhere to both the architectural and protocol imposed constraints. In his blog post Fielding explained some of the constraints even further. One of them are typed resources which states that clients shouldn't assume the API to return a specific type, i.e. some user details in JSON. This is what media-types and content negotiation are actually for.
The definition of a media type may give clients a hint on how to process the data received i.e. like with the JSON or XML based vCard format. As media types define the actual format of some specific document it may contain processing rules like pre-validation requirements or syntax regulations i.e. through XML schema or JSON schema validation.
One of the basic rules in remote computing is though to never trust inputs received and hence the server should validate the results regardless if the client has done a pre-validation before. Due to the typed resource constraint a true RESTful client will check if the received media type does support pre-validation through its spec and only apply pre-validation if the spec is defining it and also mentions some mechanisms on how to perform it (i.e. through certain schema mechanism).
My personal opinion on this is that if you try to follow the REST architectural approach you shouldn't validate the input unless the media type explicitely supports it. As a client will learn through error responses which fields and values a certain REST endpoint expects and the server hopefully validates the inputs anyway I don't see the necessity to validate it on the client side. As performance considerations are often more important than following the rules and recommendations it is though up to you. Note however, that this may or may not couple the client to the server and hence increase the risk of breaking on server changes more easily. As REST is not a protocol but a design suggestion, it is up to you which route you prefer.

Related

Is it a good practice on a REST api to check for unexpected parameters in the body?

I'm a junior developer working on my first job.
We encountered an error in our application because a teammate misused a endpoint we made, making a typo in an optional parameter in a POST body leading the backend to continue as if the optional parameter was not set.
I'm wondering what is usually the best approach to prevent these kinds of user errors, is it a bad practice to have endpoints checks that they only receive the request body data they expect with no extra fields?

Is it a good practice on a REST api to check for unexpected parameters in the body?
Maybe. More degrees of freedom is good for compatibility, but that benefit needs to be measured against the increased complexity.
You can think of the body of the HTTP request as a message, with a schema - that schema might be implicit or explicit, it might be a standardized message or something bespoke.
Problem: how confident are we that schema won't need to change? How expensive is it to coordinate a change between the server and the remote client(s)?
One way to keep the costs of future change in check is to design your message processing models such that new clients can communicate with old servers, and new servers can communicate with old clients.
The simplest processing model that enables compatible changes is to ignore content that is not understood. -- David Orchard, 2003
You can still report on unrecognized fields, which would give server operators a mechanism for detecting clients with typos in their schema implementations. A possibly friendlier operational experience, at the cost of more code. Trade offs.
One form of complexity that may arise when ignoring content: it becomes (somewhat?) more difficult to distinguish normal traffic from hostile traffic. See Dan Bergh Johnsson's work on Secure By Design.

What to use PATCH or POST?

I had a quiet long debate with my colleague about the proper HTTP verb to be used for one of our operation that changes the STATE of a resource.
Suppose we have a resource called WakeUpLan that tries to send event to a system connected in a network. This is kind of a Generic State Machine,
{
id: 1,
retries: {
idle: 5, // after 5 retries it went to FAILED state
wakeup: 0,
process: 0,
shutdown: 0
},
status: 'FAILED',
// other attributes
}`
IDLE --> WAKEUP ---> PROCESS ---> SHUTDOWN | ----> [FAILED]
Every state has a retry mechanism, i.e in IDLE case it tries for x times to transition to WAKEUP and after x retries it dies out and goes to FAILED state.
All the FAILED resource can be again manually restarted or retried one more time from some interface.
So, we have a confusion regarding which HTTP verb best suits in this case.
In my opinion, it is just a change in status and resetting retry count to 0, so that our retry mechanism can catch this and try in next iteration.
so it should be a pure PATCH request
PATCH retry/{id}
{state: 'IDLE'}
But my colleague opposes it to be a POST request as this is a pure action and should be treated as POST.
I am not convinced because we are not creating any new resource but just updating existing resource that our REST server already knows about it.
I would like to know and corrected if I am wrong here.
Any suggestions/advices are welcome.
Thanks in advance.

Any suggestions/advices are welcome.
The reference implementation of the REST architectural style is the world wide web. The world wide web is built on a foundation of URI, HTTP, and HTML -- and HTML form processing is limited to GET and POST.
So POST must be an acceptable answer. After all, the web was catastrophically successful.
PATCH, like PUT, allows you to communicate changes to a representation of a resource. The semantics are more specific than POST, which allows the client to better take advantage. So if all you are doing is creating a message that describes local edits to the representation of the resource, then PATCH is a fine choice.
Don't overlook the possibilities of PUT -- if the size of the complete representation of the resource is of roughly the same order as the representation of your PATCH document, then using PUT may be a better choice, because of the idempotent semantics.
I am not convinced because we are not creating any new resource but just updating existing resource that our REST server already knows about it.
POST is much more general than "create a new resource". Historically, there has been a lot of confusion around this point (the language in the early HTTP specifications didn't help).

HTTP Basics
PATCH
What is PATCH actually? PATCH is a HTTP method defined in RFC 5789 that is similar to patching code in software engineering, where a change to one or multiple sources should be applied in order to transform the target resource to a desired outcome. Thereby a client is calculating a set of instructions the target system has to apply fully in order to generate the requested outcome. These instruction are usually called "patch", in the words of RFC 5789 such a set of instructions is called "patch document".
RFC 5789 does not define in which representation such a patch document need to be transferred from one system to the other. For JSON-based representations application/json-patch+json (RFC 6902) can be used which contains certain instructions like add, replace, move, copy, ... that are more or less clear on what they are doing but the RFC also describes each of the available instructions further.
A further JSON-based, but totally different take on how to inform a system on how to change a resource (or document) is captured in application/merge-patch+json (RFC 7386). In contrast to json-patch, this media-type does define a set of default rules to apply on receiving a JSON based representation to the actual target resource. Here, a single JSON representation of the modified state is sent to the server that only contains fields and objects that should be changed by the server. Default rules define that fields to be removed from the target resource need to be nullified in the request while fields that should change need to contain the new value to apply. Fields that remain unchanged can be left out in the request.
If you read through RFC 5789, you will find merge-patch as more of a hack though. Compared to json-patch, a merge-patch representation lacks the control of the actual sequence the instructions are applied, which might not always be necessary, as well as the lack of changing multiple, different resources at once.
PATCH itself is not idempotent. For a json-patch patch document it is pretty clear that applying the same instructions multiple times may lead to different results, i.e. if you remove the first field. A merge-patch document here behaves similar to a "partial PUT" request that so many developers perform due to pragmatism, even though the actual operation still does not guarantee idempotency. In order to avoid applying the same patch to the same resource unintentionally multiple times, i.e. due to network errors while transmitting the patch document, it is recommended to use PATCH alongside conditional requests (RFC 7232). This guarantees that the changes are only applied to a specific version of the resource and if that resource had changed either through a previous request or by an external source, the request would be declined to prevent data loss. This is basically optimistic locking.
A requirement that all patch documents have to fulfill is, that they need to be applied atomically. Either all the changes are applied or none at all. This puts some transaction burden onto the service provider.
POST
POST method is defined in RFC 7231 as:
requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics.
This is basically a get-free-out-of-jail-card that lets you do anything you want or have to do here. You are free to define the syntax and structure to receive on a certain endpoint. Most of these so-called "REST APIs" consider POST as the C in CRUD, which it can be used for, but is just an oversimplification of what it actually can do for you. HTML basically only supports POST and GET operations so POST requests are used for sending all kinds of data to the server to start of backing processes, create new resources such as blog-posts, Q&A, videos, ... but also to delete or update stuff.
The rule of thumb here is, if a new resource is created as an outcome of triggering a POST request on a certain URI the response code should be 201 Created containing a HTTP response header Location with a URI as a value that points to the newly created resource. In any other case POST does not map to the C (create) of the CRUD stereotype.
REST-related
REST isn't a protocol but an architectural style. As Robert (Uncle Bob) C. Martin stated, architecture is about intent and REST intention is about decoupling clients from servers which allows the latter one to evolve freely by minimizing interoperability issues due to changes introduced by the server.
These are very strong benefits if your system should still work in decades to come. However, these benefits are unfortunately not obtained easily. As outlined in Fieldings dissertation to benefit from REST the mentioned constraints need to be followed strictly or otherwise couplings will remain increasing the likelihood of breaking clients due to changes. Fielding later on ranted about people that did either not read or understand his dissertation and clarified what a REST API has to do in a nutshell.
This rant can be summarized into the following points:
The API should adhere to and not violate the underlying protocol. Altough REST is used via HTTP most of the time, it is not restricted to this protocol.
Strong focus on resources and their presentation via media-types.
Clients should not have initial knowledge or assumptions on the available resources or their returned state ("typed" resource) in an API but learn them on the fly via issued requests and responses that teaches clients on what they can do next. This gives the server the freedom over its namespace and move around things it needs to without negatively impacting clients.
Based on this, REST is about using well-defined standards and adhering to the semantics of the protocols used as transportation facilities. Through the utilization of HATEOAS and stateless communication, the concepts that proved the Web to be scalable and evolution-friendly, the same interaction model that is used by humans in the Web is now used by applications in a REST architecture.
Common media-types provide the affordance on what a system might be able to do with data received for that payload while content-type negotiation guarantees that both, sender and receiver, are able to process and understand the payload correctly. The affordance may differ from media-type to media-type. A payload received for a image/png might be rendered and shown to the user while a application/vnd.acme-form+json might define a form where a server teaches a client on the elements of a request the server does support and a client can enter data and issue the request without having to actively know the method to use or target URI to send the data to as this is already given by the server. This not only removes the need for out-of-band (external) API documentation but also the need for a client to parse or interpret URIs as they are all provided by the server, accompanied by link-relations, that should be either standardized by IANA, follow common conventions such as existing rel values microformats or ontologies like Dublin Core, or represent extension types as defined in RFC 5988 (Web linking).
Question-related
With the introductory done, I hope that for a question like
But my colleague opposes it to be a POST request as this is a pure action and should be treated as POST. I am not convinced because we are not creating any new resource but just updating existing resource that our REST server already knows about it
it is clear that there is no definite yes or no answer to this quest but more of a it depends.
There are a couple of questions that could be asked, i.e. like
How many (different) clients will use the service? Are they all under your control? If so, you don't need REST, but you can still aim for it
How is the client taught or instructed on to perform the update? Will you provide an external API documentation? Will you support a media-type that supports forms, such as HTML, hal-forms, halo+json, Ion or Hydra
In general, if you have multiple clients, especially ones that are not under your control, you might not know which capabilities they support. Here content-type negotiation is an important part. If a client supports application/json-patch+json it might also be able to calculate a patch document containing the instructions to apply onto the target resource. The chances that it will also support PATCH are also very likely as RFC 6902 mentions it. In such a case it would make sense to provide a PATCH endpoint the client can send the request to.
If the client supports application/patch-merge+json one might assume that it supports PATCH as well, as it is primarily intended for use with the HTTP PATCH method, according to RFC 7386. Here the update from a client side perspective is rather trivial as the updated document is send as is to the server.
In any other case though, it is less clear in what representation formats the changes will be transmitted to the server. Here, POST is probably the way to go. From a REST stance, an update here has probably to be similar to an update done to data that is edited in a Web form in your browser with the current content being loaded into each form-element and the client modifies these form elements to its liking and then submits the changes back to the server in probably an application/x-www-form-urlencoded (or the like) structure. In such a case though, PUT would probably be more appropriate as in such a case you'd transmit the whole updated state of the resource back to the service and therefore perform a full update rather than a partial update on the target resource. The actual media-type the form will submit is probably defined in the media-type of the respective form. Note that this does not mean that you can't process json-patch or merge-patch documents in POST also.
The rule of thumb here would be, the more media-type formats and HTTP methods you support, the more likely different clients will be able to actually perform their task.

I would say you're in the right since you are not creating any new resource.
Highlight the part that says use put when you modify the entire existing resource while use patch when you are modifying one component of existing resource.
More here
https://restfulapi.net/rest-put-vs-post/

What is meaning of "SOAP requires more bandwidth and resource than REST"?

What is meaning of "SOAP requires more bandwidth and resource than REST" and "REST requires less bandwidth and resource than SOAP". What is Bandwidth and Resource author is referring too?

REST isn't a JSON based exchange of data but a technique to decouple clients from servers. The decoupling is achieved by utilizing well-defined operations of a transport protocol that form the common interface for the message exchange and by relying on well-defined, intermediary media types that describe the syntax and semantics of the data exchanged. There is though no indication that data exchanged via application that follow the REST model need less bandwith (send smaller payloads) than if exchanged via SOAP.
Why? A server that adheres to the REST principles will include plenty more options a client can use to take further actions which thus (may) bloat up the actual response and actually may require more bandwith than RPC messages exchanged via SOAP. A quote like the one from the author you are refering to, should be treated thus with special care. In addition to that, if a payload is trasfered in an XML representation, both appraoches have litteraly the same overhead on the actual exchanged syntax. Sure, SOAP may introduce the SOAP envelope, though this is mainly used to specify certain required capabilities like transaction support or the like.
The author is probably building his statement on what plenty of people consider REST but is actually RPC just via HTTP and JSON payload. Plenty of so called REST APIs are just JSON based Web APIs that more or less adhere to the HTTP operation semantics but dictate a client on how to use their services by sending proprietary JSON payload in application/json format to clients. As this media-type is pretty generic and also does not support clients in determining the semantics of the payload, a client can't really make sense of such a response format unless the knowledge is already coded into the client and thus tightly couples it to the API itself and may break if the server ever decides to return a slightly differen representation (due to updates or the like). Such representation are usually tailor made for the API and do not contain additional URIs or hints on further actions as the knowledge is already build into the client (similar to SOAP RPC).
I hope you can see that such a statement should be treated with care if you compare REST (in its true meaning) with SOAP message payload size. A server that provides clients with every option possible so it can decide on which action it can perform can be rather chatty in regards to the options possible.

SOAP requires more network bandwidth and ressources.
One of the most important reasons is the higher overhead (SOAP: XML serialization)

process in Uniform Interface vs HTTP verbs

by considering the application of REST principles in the web.
i am doing a case study on REST and I have some doubt mostly on Uniform interface.
I assumes that Uniform Interface has only one single PROCESS instead of HTTP verbs (e.g. get, post, put, delete, head, ...). Is there any potential consequences of this kind of process with conventional HTTP verbs?

Is there any potential consequences of this kind of process with conventional HTTP verbs?
There are a few.
One consideration is safety. In the RFC-7231, safe is defined this way.
Request methods are considered "safe" if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource.
So if PROCESS were a safe verb, like GET, you would have an analog of the read-only web. The HTTP spec also defines HEAD and OPTIONS (which are optimized reads) and TRACE (a debugging tool); given that HTML has been an extremely successful hypermedia format without including support for these methods suggests that they aren't particularly critical of themselves.
A safe specification of PROCESS preserves all of the scaling benefits of REST. But it's utility is limited - clients can consume content, but they can't produce any.
On the other hand, if PROCESS isn't safe, then a bunch of use cases can no longer be supported. Prefetching content is no longer an option, because the components can no longer assume that invoking PROCESS has no side effects on the server. Crawling is no longer an option, for the same reason.
It's probably worth noticing the mechanics of methods in the web -- it's the hypermedia format that describes which methods are appropriate for which links. So you could potentially work around some of the issues by defining the restrictions on the method within the hypermedia format itself. It's a different way of communicating the same information to any component that can consume the media type in question.
But there are at least two additional considerations. First, that the information in the links can only be understood by components that know that media type. On the web, most components are media type agnostic -- caching html looks exactly like caching json looks like caching jpeg.
Second, that the information in the links is only available on the outbound side. REST means stateless - all of the information needed to process the request is contained within the request. That implies that the request must include within it all of the information needed to handle a communication failure.
For instance, the http spec defines idempotent.
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
This property is important for intermediary components when they forward a request along an unreliable, and receive no response from the server. We've got no way to know if a message is lost, or just slow, and we've got no way to distinguish the request message being lost from the response message being lost.
If the request includes the information that it is idempotent, then the intermediaries know that they can resend the message to the server, rather than reporting the error to the client.
Contrast this with correct handling of POST in http; since the POST request does not have an idempotent marker on it, the components do not know that a resending the message is going to have the desired effect (which is why web browsers typically display a warning if you try to POST a form twice).
Locking yourself into a single method give you a choice; do you want to support error recovery by intermediaries? or do you want the flexibility to support not-idempotent writes?

Use of HTTP RESTful methods GET/POST/etc. Are they superfluous?

What is the value of RESTful “methods” (ie. GET, POST, PUT, DELETE, PATCH, etc.)?
Why not just make every client use the “GET” method w/ any/all relevant params, headers, requestbodies, JSON,etc. etc.?
On the server side, the response to each method is custom & independently coded!
For example, what difference does it make to issue a database query via GET instead of POST?
I understand that GET is for queries that don’t change the DB (or anything else?).
And POST is for calls that do make changes.
But, near as I can tell, the RESTful standard doesn’t prevent one to code up a server response to GET and issue a stored procedure call that indeed DOES change the DB.
Vice versa… the RESTful standard doesn’t prevent one to code up a server response to POST and issue a stored procedure call that indeed does NOT change the ANYTHING!
I’m not arguing that a midtier (HTTP) “RESTlike” layer is necessary. It clearly is.
Let's say I'm wrong (and I may be). Isn't it still likely that there are numerous REST servers violating the proper use of these protocols suffering ZERO repercussions?
The following do not directly address my questions but merely dance uncomfortably around it like an acidhead stoner at a Dead concert:
Different Models for RESTful GET and POST
RESTful - GET or POST - what to do?
GET vs POST in REST Web Service
PUT vs POST in REST
I just spent ~80 hours trying to communicate a PATCH to my REST server (older Android Java doesn't recognize the newer PATCH so I had to issue a stupid kluge HTTP-OVERIDE-METHOD in the header). A POST would have worked fine but the sysop wouldn't budge because he respects REST.
I just don’t understand why to bother with each individual method. They don't seem to have much impact on Idempotence. They seem to be mere guidelines. And if you "violate" these "guidelines" they give someone else a chance to point a feckless finger at you. But so what?
Aren't these guidelines more trouble than they're worth?
I'm just confused. Please excuse the stridency of my post.
Aren’t REST GET/POST/etc. methods superfluous?

What is the value of RESTful “methods” (ie. GET, POST, PUT, DELETE, PATCH, etc.)?
First, a clarification. Those aren't RESTful methods; those are HTTP methods. The web is a reference implementation (for the most part) of the REST architectural style.
Which means that the authoritative answers to your questions are documented in the HTTP specification.
But, near as I can tell, the RESTful standard doesn’t prevent one to code up a server response to GET and issue a stored procedure call that indeed DOES change the DB.
The HTTP specification designates certain methods as being safe. Casually, this designates that a method is read only; the client is not responsible for any side effects that may occur on the server.
The purpose of distinguishing between safe and unsafe methods is to allow automated retrieval processes (spiders) and cache performance optimization (pre-fetching) to work without fear of causing harm.
But you are right, the HTTP standard doesn't prevent you from changing your database in response to a GET request. In fact, it even calls out specifically a case where you may choose to do that:
a safe request initiated by selecting an advertisement on the Web will often have the side effect of charging an advertising account.
The HTTP specification also designates certain methods as being idempotent
Of the request methods defined by this specification, PUT, DELETE, and safe request methods are idempotent.
The motivation for having idempotent methods? Unreliable networks
Idempotent methods are distinguished because the request can be repeated automatically if a communication failure occurs before the client is able to read the server's response.
Note that the client here might not be the user agent, but an intermediary component (like a reverse proxy) participating in the conversation.
Thus, if I'm writing a user agent, or a component, that needs to talk to your server, and your server conforms to the definition of methods in the HTTP specification, then I don't need to know anything about your application protocol to know how to correctly handle lost messages when the method is GET, PUT, or DELETE.
On the other hand, POST doesn't tell me anything, and since the unacknowledged message may still be on its way to you, it is dangerous to send a duplicate copy of the message.
Isn't it still likely that there are numerous REST servers violating the proper use of these protocols suffering ZERO repercussions?
Absolutely -- remember, the reference implementation of hypermedia is HTML, and HTML doesn't include support PUT or DELETE. If you want to afford a hypermedia control that invokes an unsafe operation, while still conforming to the HTTP and HTML standards, the POST is your only option.
Aren't these guidelines more trouble than they're worth?
Not really? They offer real value in reliability, and the extra complexity they add to the mix is pretty minimal.
I just don’t understand why to bother with each individual method. They don't seem to have much impact on idempotence.
They don't impact it, they communicate it.
The server already knows which of its resources are idempotent receivers. It's the client and the intermediary components that need that information. The HTTP specification gives you the ability to communicate that information for free to any other compliant component.
Using the maximally appropriate method for each request means that you can deploy your solution into a topology of commodity components, and it just works.
Alternatively, you can give up reliable messaging. Or you can write a bunch of custom code in your components to tell them explicitly which of your endpoints are idempotent receivers.
POST vs PATCH
Same song, different verse. If a resource supports OPTIONS, GET, and PATCH, then I can discover everything I need to know to execute a partial update, and I can do so using the same commodity implementation I use everywhere else.
Achieving the same result with POST is a whole lot more work. For instance, you need some mechanism for communicating to the client that POST has partial update semantics, and what media-types are accepted when patching a specific resource.
What do I lose by making each call on the client GET and the server honoring such just by paying attention to the request and not the method?
Conforming user-agents are allowed to assume that GET is safe. If you have side effects (writes) on endpoints accessible via GET, then the agent is allowed to pre-fetch the endpoint as an optimization -- the side effects start firing even though nobody expects it.
If the endpoint isn't an idempotent receiver, then you have to consider that the GET calls can happen more than once.
Furthermore, the user agent and intermediary components are allowed to make assumptions about caching -- requests that you expect to get all the way through to the server don't, because conforming components along the way are permitted to server replies out of their own cache.
To ice the cake, you are introducing another additional risk; undefined behavior.
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
Where I believe you are coming from, though I'm not certain, is more of an RPC point of view. Client sends a message, server responds; so long as both participants in the conversation have a common understanding of the semantics of the message, does it matter if the text in the message says "GET" or "POST" or "PATCH"? Of course not.
RPC is a fantastic choice when it fits the problem you are trying to solve.
But...
RPC at web scale is hard. Can your team deliver that? can your team deliver with cost effectiveness?
On the other hand, HTTP at scale is comparatively simple; there's an enormous ecosystem of goodies, using scalable architectures, that are stable, tested, well understood, and inexpensive. The tires are well and truly kicked.
You and your team hardly have to do anything; a bit of block and tackle to comply with the HTTP standards, and from that point on you can concentrate on delivering business value while you fall into the pit of success.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse