What can go wrong if we do NOT follow RESTful best practices? - rest

TL;DR : scroll down to the last paragraph.
There is a lot of talk about best practices when defining RESTful APIs: what HTTP methods to support, which HTTP method to use in each case, which HTTP status code to return, when to pass parameters in the query string vs. in the path vs. in the content body vs. in the headers, how to do versioning, result set limiting, pagination, etc.
If you are already determined to make use of best practices, there are lots of questions and answers out there about what is the best practice for doing any given thing. Unfortunately, there appears to be no question (nor answer) as to why use best practices in the first place.
Most of the best practice guidelines direct developers to follow the principle of least surprise, which, under normal circumstances, would be a good enough reason to follow them. Unfortunately, REST-over-HTTP is a capricious standard, the best practices of which are impossible to implement without becoming intimately involved with it, and the drawback of intimate involvement is that you tend to end up with your application being very tightly bound to a particular transport mechanism. So, some people (like me) are debating whether the benefit of "least surprise" justifies the drawback of littering the application with REST-over-HTTP concerns.
A different approach examined as an alternative to best practices suggests that our involvement with HTTP should be limited to the bare minimum necessary in order to get an application-defined payload from point A to point B. According to this approach, you only use a single REST entry point URL in your entire application, you never use any HTTP method other than HTTP POST, never return any HTTP status code other than HTTP 200 OK, and never pass any parameter in any way other than within the application-specific payload of the request. The request will either fail to be delivered, in which case it is the responsibility of the web server to return an "HTTP 404 Not Found" to the client, or it will be successfully delivered, in which case the delivery of the request was "HTTP 200 OK" as far as the transport protocol is concerned, and anything else that might go wrong from that point on is exclusively an application concern, and none of the transport protocol's business. Obviously, this approach is kind of like saying "let me show you where to stick your best practices".
Now, there are other voices that say that things are not that simple, and that if you do not follow the RESTful best practices, things will break.
The story goes that for example, in the event of unauthorized access, you should return an actual "HTTP 401 Unauthorized" (instead of a successful response containing a json-serialized UnauthorizedException) because upon receiving the 401, the browser will prompt the user of credentials. Of course this does not really hold any water, because REST requests are not issued by browsers being used by human users.
Another, more sophisticated way the story goes is that usually, between the client and the server exist proxies, and these proxies inspect HTTP requests and responses, and try to make sense out of them, so as to handle different requests differently. For example, they say, somewhere between the client and the server there may be a caching proxy, which may treat all requests to the exact same URL as identical and therefore cacheable. So, path parameters are necessary to differentiate between different resources, otherwise the caching proxy might only ever forward a request to the server once, and return cached responses to all clients thereafter. Furthermore, this caching proxy may need to know that a certain request-response exchange resulted in a failure due to a particular error such as "Permission Denied", so as to again not cache the response, otherwise a request resulting in a temporary error may be answered with a cached error response forever.
So, my questions are:
Besides "familiarity" and "least surprise", what other good reasons are there for following REST best practices? Are these concerns about proxies real? Are caching proxies really so dumb as to cache REST responses? Is it hard to configure the proxies to behave in less dumb ways? Are there drawbacks in configuring the proxies to behave in less dumb ways?

It's worth considering that what you're suggesting is the way that HTTP APIs used to be designed for a good 15 years or so. API designers are tending to move away from that approach these days. They really do have their reasons.
Some points to consider if you want to avoid using ReST over HTTP:
ReST over HTTP is an efficient use of the HTTP/S transport mechanism. Avoiding the ReST paradigm runs the risk of every request / response being wrapped in verbose envelopes. SOAP is an example of this.
ReST encourages client and server decoupling by putting application semantics into standard mechanisms - HTTP and XML/JSON (or others data formats). These protocols and standards are well supported by standard libraries and have been built up over years of experience. Sure, you can create your own 'unauthorized' response body with a 200 status code, but ReST frameworks just make it unnecessary so why bother?
ReST is a design approach which encourages a view of your distributed system which focuses on data rather than functionality, and this has a proven a useful mechanism for building distributed systems. Avoiding ReST runs the risk of focusing on very RPC-like mechanisms which have some risks of their own:
they can become very fine-grained and 'chatty'
which can be an inefficient use of network bandwidth
which can tightly couple client and server, through introducing stateful-ness and temporal coupling beteween requests.
and can be difficult to scale horizontally
Note: there are times when an RPC approach is actually a better way of breaking down a distributed system than a resource-oriented approach, but they tend to be the exceptions rather than the rule.
existing tools for developers make debugging / investigations of ReSTful APIs easier. It's easy to use a browser to do a simple GET, for example. And tools such as Postman or RestClient already exist for more complex ReST-style queries. In extreme situations tcpdump is very useful, as are browser debugging tools such as firebug. If every API call has application layer semantics built on top of HTTP (e.g. special response types for particular error situations) then you immediately lose some value from some of this tooling. Building SOAP envelopes in PostMan is a pain. As is reading SOAP response envelopes.
network infrastructure around caching really can be as dumb as you're asking. It's possible to get around this but you really do have to think about it and it will inevitably involve increased network traffic in some situations where it's unnecessary. And caching responses for repeated queries is one way in which APIs scale out, so you'll likely need to 'solve' the problem yourself (i.e. reinvent the wheel) of how to cache repeated queries.
Having said all that, if you want to look into a pure message-passing design for your distributed system rather than a ReSTful one, why consider HTTP at all? Why not simply use some message-oriented middleware (e.g. RabbitMQ) to build your application, possibly with some sort of HTTP bridge somewhere for Internet-based clients? Using HTTP as a pure transport mechanism involving a simple 'message accepted / not accepted' semantics seems overkill.

REST is intended for long-lived network-based applications that span multiple organizations. If you don’t see a need for the constraints, then don’t use them. -- Roy T Fielding
Unfortunately, there appears to be no question (nor answer) as to why use best practices in the first place.
When in doubt, go back to the source
Fielding's dissertation really does quite a good job at explaining how the REST architectural constraints ensure that you don't destroy the properties those constraints are designed to protect.
Keep in mind - before the web (which is the reference application for REST), "web scale" wasn't a thing; the notion of a generic client (the browers) that could discover and consume thousands of customized applications (provided by web servers) had not previously been realized.
According to this approach, you only use a single REST entry point URL in your entire application, you never use any HTTP method other than HTTP POST, never return any HTTP status code other than HTTP 200 OK, and never pass any parameter in any way other than within the application-specific payload of the request.
Yup - that's a thing, it's called RPC; you are effectively taking the web, and stripping it down to a bare message transport application that just happens to tunnel through port 80.
In doing so, you have stripped away the uniform interface -- you've lost the ability to use commodity parts in your deployment, because nobody can participate in the conversation unless they share the same interpretation of the message data.
Note: that's doesn't at all imply that RPC is "broken"; architecture is about tradeoffs. The RPC approach gives up some of the value derived from the properties guarded by REST, but that doesn't mean it doesn't pick up value somewhere else. Horses for courses.
Besides "familiarity" and "least surprise", what other good reasons are there for following REST best practices?
Cheap scaling of reads - as your offering becomes more popular, you can service more clients by installing a farm of commodity reverse-proxies that will serve cached representations where available, and only put load on the server when no fresh representation is available.
Prefetching - if you are adhering to the safety provisions of the interface, agents (and intermediaries) know that they can download representations at their own discretion without concern that the operators will be liable for loss of capital. AKA - your resources can be crawled (and cached)
Similarly, use of idempotent methods (where appropriate) communicates to agents (and intermediaries) that retrying the send of an unacknowledged message causes no harm (for instance, in the event of a network outage).
Independent innovation of clients and servers, especially cross organizations. Mosaic is a museum piece, Netscape vanished long ago, but the web is still going strong.
Of course this does not really hold any water, because REST requests are not issued by browsers being used by human users.
Of course they are -- where do you think you are reading this answer?
So far, REST works really well at exposing capabilities to human agents; which is to say that the server side is so ubiquitous at this point that we hardly think about it any more. The notion that you -- the human operator -- can use the same application to order pizza, run diagnostics on your house, and remote start your car is as normal as air.
But you are absolutely right that replacing the human still seems a long ways off; there are various standards and media types for communicating semantic content of data -- the automated client can look at markup, identify a phone number element, and provide a customized array of menu options from it -- but building into agents the sorts of fuzzy intelligence needed to align offered capabilities with goals, or to recover from error conditions, seems to be a ways off.

Related

Is it a good practice on a REST api to check for unexpected parameters in the body?

I'm a junior developer working on my first job.
We encountered an error in our application because a teammate misused a endpoint we made, making a typo in an optional parameter in a POST body leading the backend to continue as if the optional parameter was not set.
I'm wondering what is usually the best approach to prevent these kinds of user errors, is it a bad practice to have endpoints checks that they only receive the request body data they expect with no extra fields?
Is it a good practice on a REST api to check for unexpected parameters in the body?
Maybe. More degrees of freedom is good for compatibility, but that benefit needs to be measured against the increased complexity.
You can think of the body of the HTTP request as a message, with a schema - that schema might be implicit or explicit, it might be a standardized message or something bespoke.
Problem: how confident are we that schema won't need to change? How expensive is it to coordinate a change between the server and the remote client(s)?
One way to keep the costs of future change in check is to design your message processing models such that new clients can communicate with old servers, and new servers can communicate with old clients.
The simplest processing model that enables compatible changes is to ignore content that is not understood. -- David Orchard, 2003
You can still report on unrecognized fields, which would give server operators a mechanism for detecting clients with typos in their schema implementations. A possibly friendlier operational experience, at the cost of more code. Trade offs.
One form of complexity that may arise when ignoring content: it becomes (somewhat?) more difficult to distinguish normal traffic from hostile traffic. See Dan Bergh Johnsson's work on Secure By Design.

Is there standard way of making multiple API calls combined into one HTTP request?

While designing rest API's I time to time have challenge to deal with batch operations (e.g. delete or update many entities at once) to reduce overhead of many tcp client connections. And in particular situation problem usually solves by adding custom api method for specific operation (e.g. POST /files/batchDelete which accepts ids at request body) which doesn't look pretty from point of view of rest api design principles but do the job.
But for me general solution for the problem still desirable. Recently I found Google Cloud Storage JSON API batching documentation which for me looks like pretty general solution. I mean similar format may be used for any http api, not just google cloud storage. So my question is - does anybody know kind of general standard (standard or it's draft, guideline, community effort or so) of making multiple API calls combined into one HTTP request?
I'm aware of capabilities of http/2 which include usage of single tcp connection for http requests but my question is addressed to application level. Which in my opinion still make sense because despite of ability to use http/2 taking that on application level seems like the only way to guarantee that for any client including http/1 which is currently the most used version of http.
TL;DR
REST nor HTTP are ideal for batch operations.
Usually caching, which is one of RESTs constraints, which is not optional but mandatory, prevents batch processing in some form.
It might be beneficial to not expose the data to update or remove in batch as own resources but as data elements within a single resource, like a data table in a HTML page. Here updating or removing all or parts of the entries should be straight forward.
If the system in general is write-intensive it is probably better to think of other solutions such as exposing the DB directly to those clients to spare a further level of indirection and complexity.
Utilization of caching may prevent a lot of workload on the server and even spare unnecessary connecctions
To start with, REST nor HTTP are ideal for batch operations. As Jim Webber pointed out the application domain of HTTP is the transfer of documents over the Web. This is what HTTP does and this is what it is good at. However, any business rules we conclude are just a side effect of the document management and we have to come up with solutions to turn this document management side effects to something useful.
As REST is just a generalization of the concepts used in the browsable Web, it is no miracle that the same concepts that apply to Web development also apply to REST development in some form. Thereby a question like how something should be done in REST usually resolves around answering how something should be done on the Web.
As mentioned before, HTTP isn't ideal in terms of batch processing actions. Sure, a GET request may retrieve multiple results, though in reality you obtain one response containing links to further resources. The creation of resources has, according to the HTTP specification, to be indicated with a Location header that points to the newly created resource. POST is defined as an all purpose method that allows to perform tasks according to server-specific semantics. So you could basically use it to create multiple resources at once. However, the HTTP spec clearly lacks support for indicating the creation of multiple resources at once as the Location header may only appear once per response as well as define only one URI in it. So how can a server indicate the creation of multiple resources to the server?
A further indication that HTTP isn't ideal for batch processing is that a URI must reference a single resource. That resource may change over time, though the URI can't ever point to multiple resources at once. The URI itself is, more or less, used as key by caches which store a cacheable response representation for that URI. As a URI may only ever reference one single resource, a cache will also only ever store the representation of one resource for that URI. A cache will invalidate a stored representation for a URI if an unsafe operation is performed on that URI. In case of a DELETE operation, which is by nature unsafe, the representation for the URI the DELETE is performed on will be removed. If you now "redirect" the DELETE operation to remove multiple backing resources at once, how should a cache take notice of that? It only operates on the URI invoked. Hence even when you delete multiple resources in one go via DELETE a cache might still serve clients with outdated information as it simply didn't take notice of the removal yet and its freshness value would still indicate a fresh-enough state. Unless you disable caching by default, which somehow violates one of REST's constraints, or reduce the time period a representation is considered fresh enough to a very low value, clients will probably get served with outdated information. You could of course perform an unsafe operation on each of these URIs then to "clear" the cache, though in that case you could have invoked the DELETE operation on each resource you wanted to batch delete itself to start with.
It gets a bit easier though if the batch of data you want to remove is not explicitly captured via their own resources but as data of a single resource. Think of a data-table on a Web page where you have certain form-elements, such as a checkbox you can click on to mark an entry as delete candidate and then after invoking the submit button send the respective selected elements to the server which performs the removal of these items. Here only the state of one resource is updated and thus a simple POST, PUT or even PATCH operation can be performed on that resource URI. This also goes well with caching as outlined before as only one resource has to be altered, which through the usage of unsafe operations on that URI will automatically lead to an invalidation of any stored representation for the given URI.
The above mentioned usage of form-elements to mark certain elements for removal depends however on the media-type issued. In the case of HTML its forms section specifies the available components and their affordances. An affordance is the knowledge what you can and should do with certain objects. I.e. a button or link may want to be pushed, a text field may expect numeric or alphanumeric input which further may be length limited and so on. Other media types, such as hal-forms, halform or ion, attempt to provide form representations and components for a JSON based notation, however, support for such media-types is still quite limited.
As one of your concerns are the number of client connections to your service, I assume you have a write-intensive scenario as in read-intensive cases caching would probably take away a good chunk of load from your server. I.e. BBC once reported that they could reduce the load on their servers drastically just by introducing a one minute caching interval for recently requested resources. This mainly affected their start page and the linked articles as people clicked on the latest news more often than on old news. On receiving a couple of thousands, if not hundred thousands, request per minute they could, as mentioned before, reduce the number of requests actually reaching the server significantly and therefore take away a huge load on their servers.
Write intensive use-cases however can't take benefit of caching as much as read-intensive cases as the cache would get invalidated quite often and the actual request being forward to the server for processing. If the API is more or less used to perform CRUD operations, as so many "REST" APIs do in reality, it is questionable if it wouldn't be preferable to expose the database directly to the clients. Almost all modern database vendors ship with sophisticated user-right management options and allow to create views that can be exposed to certain users. The "REST API" on top of it basically just adds a further level of indirection and complexity in such a case. By exposing the DB directly, performing batch updates or deletions shouldn't be an issue at all as through the respective query languages support for such operations should already be build into the DB layer.
In regards to the number of connections clients create: HTTP from 1.0 on allows the reusage of connections via the Connection: keep-alive header directive. In HTTP/1.1 persistent connections are used by default if not explicitly requested to close via the respective Connection: close header directive. HTTP/2 introduced full-duplex connections that allow many channels and therefore requests to reuse the same connections at the same time. This is more or less a fix for the connection limitation suggested in RFC 2626 which plenty of Web developers avoided by using CDN and similar stuff. Currently most implementations use a maximum limit of 100 channels and therefore simultaneous downloads via a single connections AFAIK.
Usually opening and closing a connection takes a bit of time and server resources and the more open connections a server has to deal with the more a system may suffer. Though open connections with hardly any traffic aren't a big issue for most servers. While the connection creation was usually considered to be the costly part, through the usage of persistent connections that factor moved now towards the number of requests issued, hence the request for sending out batch-requests, which HTTP is not really made for. Again, as mentioned throughout the post, through the smart utilization of caching plenty of requests may never reach the server at all, if possible. This is probably one of the best optimization strategies to reduce the number of simultaneous requests, as probably plenty of requests might never reach the server at all. Probably the best advice to give is in such a case to have a look at what kind of resources are requested frequently, which requests take up a lot of processing capacity and which ones can easily get responded with by utilizing caching options.
reduce overhead of many tcp client connections
If this is the crux of the issue, the easiest way to solve this is to switch to HTTP/2
In a way, HTTP/2 does exactly what you want. You open 1 connection, and using that collection you can send many HTTP requests in parallel. Unlike batching in a single HTTP request, it's mostly transparent for clients and response and requests can be processed out of order.
Ultimately batching multiple operations in a single HTTP request is always a network hack.
HTTP/2 is widely available. If HTTP/1.1 is still the most used version (this might be true, but gap is closing), this has more to do with servers not yet being set up for it, not clients.

process in Uniform Interface vs HTTP verbs

by considering the application of REST principles in the web.
i am doing a case study on REST and I have some doubt mostly on Uniform interface.
I assumes that Uniform Interface has only one single PROCESS instead of HTTP verbs (e.g. get, post, put, delete, head, ...). Is there any potential consequences of this kind of process with conventional HTTP verbs?
Is there any potential consequences of this kind of process with conventional HTTP verbs?
There are a few.
One consideration is safety. In the RFC-7231, safe is defined this way.
Request methods are considered "safe" if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource.
So if PROCESS were a safe verb, like GET, you would have an analog of the read-only web. The HTTP spec also defines HEAD and OPTIONS (which are optimized reads) and TRACE (a debugging tool); given that HTML has been an extremely successful hypermedia format without including support for these methods suggests that they aren't particularly critical of themselves.
A safe specification of PROCESS preserves all of the scaling benefits of REST. But it's utility is limited - clients can consume content, but they can't produce any.
On the other hand, if PROCESS isn't safe, then a bunch of use cases can no longer be supported. Prefetching content is no longer an option, because the components can no longer assume that invoking PROCESS has no side effects on the server. Crawling is no longer an option, for the same reason.
It's probably worth noticing the mechanics of methods in the web -- it's the hypermedia format that describes which methods are appropriate for which links. So you could potentially work around some of the issues by defining the restrictions on the method within the hypermedia format itself. It's a different way of communicating the same information to any component that can consume the media type in question.
But there are at least two additional considerations. First, that the information in the links can only be understood by components that know that media type. On the web, most components are media type agnostic -- caching html looks exactly like caching json looks like caching jpeg.
Second, that the information in the links is only available on the outbound side. REST means stateless - all of the information needed to process the request is contained within the request. That implies that the request must include within it all of the information needed to handle a communication failure.
For instance, the http spec defines idempotent.
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
This property is important for intermediary components when they forward a request along an unreliable, and receive no response from the server. We've got no way to know if a message is lost, or just slow, and we've got no way to distinguish the request message being lost from the response message being lost.
If the request includes the information that it is idempotent, then the intermediaries know that they can resend the message to the server, rather than reporting the error to the client.
Contrast this with correct handling of POST in http; since the POST request does not have an idempotent marker on it, the components do not know that a resending the message is going to have the desired effect (which is why web browsers typically display a warning if you try to POST a form twice).
Locking yourself into a single method give you a choice; do you want to support error recovery by intermediaries? or do you want the flexibility to support not-idempotent writes?

Use of HTTP RESTful methods GET/POST/etc. Are they superfluous?

What is the value of RESTful “methods” (ie. GET, POST, PUT, DELETE, PATCH, etc.)?
Why not just make every client use the “GET” method w/ any/all relevant params, headers, requestbodies, JSON,etc. etc.?
On the server side, the response to each method is custom & independently coded!
For example, what difference does it make to issue a database query via GET instead of POST?
I understand that GET is for queries that don’t change the DB (or anything else?).
And POST is for calls that do make changes.
But, near as I can tell, the RESTful standard doesn’t prevent one to code up a server response to GET and issue a stored procedure call that indeed DOES change the DB.
Vice versa… the RESTful standard doesn’t prevent one to code up a server response to POST and issue a stored procedure call that indeed does NOT change the ANYTHING!
I’m not arguing that a midtier (HTTP) “RESTlike” layer is necessary. It clearly is.
Let's say I'm wrong (and I may be). Isn't it still likely that there are numerous REST servers violating the proper use of these protocols suffering ZERO repercussions?
The following do not directly address my questions but merely dance uncomfortably around it like an acidhead stoner at a Dead concert:
Different Models for RESTful GET and POST
RESTful - GET or POST - what to do?
GET vs POST in REST Web Service
PUT vs POST in REST
I just spent ~80 hours trying to communicate a PATCH to my REST server (older Android Java doesn't recognize the newer PATCH so I had to issue a stupid kluge HTTP-OVERIDE-METHOD in the header). A POST would have worked fine but the sysop wouldn't budge because he respects REST.
I just don’t understand why to bother with each individual method. They don't seem to have much impact on Idempotence. They seem to be mere guidelines. And if you "violate" these "guidelines" they give someone else a chance to point a feckless finger at you. But so what?
Aren't these guidelines more trouble than they're worth?
I'm just confused. Please excuse the stridency of my post.
Aren’t REST GET/POST/etc. methods superfluous?
What is the value of RESTful “methods” (ie. GET, POST, PUT, DELETE, PATCH, etc.)?
First, a clarification. Those aren't RESTful methods; those are HTTP methods. The web is a reference implementation (for the most part) of the REST architectural style.
Which means that the authoritative answers to your questions are documented in the HTTP specification.
But, near as I can tell, the RESTful standard doesn’t prevent one to code up a server response to GET and issue a stored procedure call that indeed DOES change the DB.
The HTTP specification designates certain methods as being safe. Casually, this designates that a method is read only; the client is not responsible for any side effects that may occur on the server.
The purpose of distinguishing between safe and unsafe methods is to allow automated retrieval processes (spiders) and cache performance optimization (pre-fetching) to work without fear of causing harm.
But you are right, the HTTP standard doesn't prevent you from changing your database in response to a GET request. In fact, it even calls out specifically a case where you may choose to do that:
a safe request initiated by selecting an advertisement on the Web will often have the side effect of charging an advertising account.
The HTTP specification also designates certain methods as being idempotent
Of the request methods defined by this specification, PUT, DELETE, and safe request methods are idempotent.
The motivation for having idempotent methods? Unreliable networks
Idempotent methods are distinguished because the request can be repeated automatically if a communication failure occurs before the client is able to read the server's response.
Note that the client here might not be the user agent, but an intermediary component (like a reverse proxy) participating in the conversation.
Thus, if I'm writing a user agent, or a component, that needs to talk to your server, and your server conforms to the definition of methods in the HTTP specification, then I don't need to know anything about your application protocol to know how to correctly handle lost messages when the method is GET, PUT, or DELETE.
On the other hand, POST doesn't tell me anything, and since the unacknowledged message may still be on its way to you, it is dangerous to send a duplicate copy of the message.
Isn't it still likely that there are numerous REST servers violating the proper use of these protocols suffering ZERO repercussions?
Absolutely -- remember, the reference implementation of hypermedia is HTML, and HTML doesn't include support PUT or DELETE. If you want to afford a hypermedia control that invokes an unsafe operation, while still conforming to the HTTP and HTML standards, the POST is your only option.
Aren't these guidelines more trouble than they're worth?
Not really? They offer real value in reliability, and the extra complexity they add to the mix is pretty minimal.
I just don’t understand why to bother with each individual method. They don't seem to have much impact on idempotence.
They don't impact it, they communicate it.
The server already knows which of its resources are idempotent receivers. It's the client and the intermediary components that need that information. The HTTP specification gives you the ability to communicate that information for free to any other compliant component.
Using the maximally appropriate method for each request means that you can deploy your solution into a topology of commodity components, and it just works.
Alternatively, you can give up reliable messaging. Or you can write a bunch of custom code in your components to tell them explicitly which of your endpoints are idempotent receivers.
POST vs PATCH
Same song, different verse. If a resource supports OPTIONS, GET, and PATCH, then I can discover everything I need to know to execute a partial update, and I can do so using the same commodity implementation I use everywhere else.
Achieving the same result with POST is a whole lot more work. For instance, you need some mechanism for communicating to the client that POST has partial update semantics, and what media-types are accepted when patching a specific resource.
What do I lose by making each call on the client GET and the server honoring such just by paying attention to the request and not the method?
Conforming user-agents are allowed to assume that GET is safe. If you have side effects (writes) on endpoints accessible via GET, then the agent is allowed to pre-fetch the endpoint as an optimization -- the side effects start firing even though nobody expects it.
If the endpoint isn't an idempotent receiver, then you have to consider that the GET calls can happen more than once.
Furthermore, the user agent and intermediary components are allowed to make assumptions about caching -- requests that you expect to get all the way through to the server don't, because conforming components along the way are permitted to server replies out of their own cache.
To ice the cake, you are introducing another additional risk; undefined behavior.
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
Where I believe you are coming from, though I'm not certain, is more of an RPC point of view. Client sends a message, server responds; so long as both participants in the conversation have a common understanding of the semantics of the message, does it matter if the text in the message says "GET" or "POST" or "PATCH"? Of course not.
RPC is a fantastic choice when it fits the problem you are trying to solve.
But...
RPC at web scale is hard. Can your team deliver that? can your team deliver with cost effectiveness?
On the other hand, HTTP at scale is comparatively simple; there's an enormous ecosystem of goodies, using scalable architectures, that are stable, tested, well understood, and inexpensive. The tires are well and truly kicked.
You and your team hardly have to do anything; a bit of block and tackle to comply with the HTTP standards, and from that point on you can concentrate on delivering business value while you fall into the pit of success.

RESTful interface for ECG/EEG sensor data in haskell

I'm working on a project in which I want to display biosensor EEG/ECG data measured by a portable device (e.g., a micro controller with wireless data transmission via Wifi or Bluetooth). For this purpose, I need to interface with the portable device/microcontroller, for which the many or some of the device seem to use RESTful interfaces, but offer also probably sockets.
One example of microcontroller with wifi is the "spark.io", which is based on a cortex m3 and CC3000 wireless controller for WiFi access on-board. The data to be transferred are around 500 to 1000 float values per second, which should arrive at the REST client with as little delay as possible. Probably an non-REST approach like sockets would fit better, but I would still like to test an approach based on a RESTFul interface (a tiny argument for this would be that transferring data via RESFul interface seems very common and has good library support).
Q: The question is, what is the best approach for a performant (in the sense of near-realtime) implementation that interfaces with this via REST interface?
I am sure this problem has been solved before, but I could not quickly find a paper via google scholar or technical/scientific blog post that explains this. The only link I found is on "rest hooks", but I am not sure if this is a good approach. Searching on SE didn't reveal a past question on this.
Side note: My approach would be to implement the interface in haskell first to test the design and performance of the RESFull interface. Later the working approach should be ported or implemented with Java/Android/spark.io/some other microcontroller.
(Please note this question is entirely about the architecture and not at all about haskell libraries or anything. If using REST is the stupiest thing, I will accept that as an answer if it is argumented. Also then the question is then whether in general microcontroller web-interfaces and specically their APIs, like that of "spark.io", are in general a stupid idea, if they are implemented via REST. Is this the case? If not, what definition of "near real time" justifies that a REST interface is a bad idea and thus other means of communcation are better. Like: one sensor read per minute? Or, one per second, by 1/10 second, by 1/100 second, by 1/1000 second?)
Okay, let's go through this.
REST is not necessarily a bad idea but it has a lot of features which you may not need. For example, there are REST verbs not just for retrieval, but also updating, deleting, and creating resources. If those functions are important (e.g. you need to send certain control data to the EEG controller) then REST will be nice. If you just want fast access to the stream of data, consider raw TCP instead.
Similarly, REST will package messages into "requests" and their "responses" which come with a bunch of "headers" indicating things like whether the request could be fulfilled, whether it's compressed, etc. These can be great features but may be bloat. You'll probably want to emit enough data on each request so that the ~1kB of headers are a small fraction of it. But given 8-byte floats (doubles), that requires transmitting 500-1000 data points, which you've said will take about one second. Is that our fate -- to always have 1s of latency?
REST will allow you to avoid some of that bloat by declaring a Transfer-Encoding: chunked so that the client can operate on individual chunks as they become available. So that's an architectural decision that I think will need to be made.
I would definitely get Keep-Alive working as soon as possible, and it would be my chief feature when looking for what library to use on the server. Keep-Alive is a standard extension to HTTP which avoids tearing down and rebuilding the TCP stack for each HTTP request. If you don't do this then you have some heavy protocol negotiations each time you send a request.
A crucial decision you'll have to make involves whether you want to do HTTP pipelining or not. You can combine HTTP pipelining with longer-lived requests (ones where you don't expect an immediate response) to essentially "send the data when it becomes available" (i.e. send the headers first and let the server push out the data when it's good and ready). This is an alternative to chunked transfers.
If you can work those out, then HTTP is regularly used to send megabytes per second, so your use case fits well within what REST is capable of. In terms of REST/HTTP libraries for Haskell, if you have to somehow program the controller yourself, the big options are wai, yesod, snap, and rest. If you just need an HTTP client there are a few of those too.