Making "parse" function RESTful - rest

I have a RESTful service for getting let's say devices. It provides very usual functionality:
GET /devices
GET /devices/:id
POST /devices
PUT /devices/:id
DELETE /devices/:id
The device object might be defined as follows:
{
id: 123,
name: "Smoke detector",
firmware: "21.0.103",
battery: "ok",
last_maintenance: "2017-07-07",
last_alarm: "2014-02-01 12:11:10",
// ...
}
There is an application that might read device state via some device specific reader. The application itself has no idea how to interpret read data, but it might ask server to do it. In our case let's assume that the data contains the following: battery status, firmware version, last alarm.
If I were implementing regular RPC service, I would create function with "parse" meaning. It means it accept the raw data and returns an updated device object (or, alternatively, only the part of the device object containing the parsed state). But I doubt that I could find a good REST solution for such function. Now I am doing it via PATCH, but I personally do not like this solution, and therefore I will not provide it here. I believe there should be good solution for such class of problems.
So the question: how should I fit my "parse" logic in REST paradigm?

POST it to a /parsed-device-state URL, which will return a 201 Created, a Location header pointing to the place where you can get the parsed data from, and if you like, return the parsed data in the 201 as well (along with an additional Content-Location header with the same value as the Location header). Or if it takes a long time to parse, use 202 Accepted, and the same Location header. The caller can then poll that provided location until the results are ready.

So the question: how should I fit my "parse" logic in REST paradigm?
How would you fit your parse logic into a web site?
You'd probably start with a bookmark. GET $BOOKMARK would return a representation of a form. The form might include an input control like a text area element that would allow the consumer to input a representation, or it might include a input control that allows the consumer to link into a file. The consumer would submit the form, and the agent would create a request from the information in the form. That would probably be a POST (you aren't likely to include an arbitrary file's representation onto the query string) to whatever resource was specified as the action of the form. The server's response would provide a representation of the result.
If parsing were a particularly slow process, then the response instead might be a representation including links to resources that could be used to track the progress of the parsing. The whole protocol in this case looks a lot like putting work on a queue, and then polling for updates.
It's the right answer to a problem that is not a great fit for HTTP:
The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction.
To some degree, what you are trying to do with your function is transfer compute, which may be why it feels like you are trimming corners off of the peg to fit it in the hole.
An alternative approach, which is a better fit for HTTP, is think about transferring a representation of the behavior. The API client gets a function that understands how to parse apples into oranges, and then runs that code on the information that it keeps locally. Think java script - we get a representation of the behavior from the server (which can embed into that representation information the server has that the client will need), and then execute the result locally. Metadata in the headers describes the lifetime of the representation, in a way that is understood by any standards compliant cache.

Related

How to get access to the request payload in Tapir/ZIOHttp DefaultServerLog?

We build a REST microservice with Scala 3, ZIO 2, ZIO logging and Tapir.
For context specific logging we want to use the MDC and set an attribute there which is taken from the request payload.
Is it possible to get access to the request payload in DefaultServerLog to extract the MDC attribute and then use it for ZIO logging feature MDC logging, i.e. create a LogAnnotation from the extracted attribute, so it will also be logged by all DefaultServerLog methods (doLogWhenHandled etc.). Currently it works for our own log statements, but not for those of Tapir/ZIO-HTTP.
See answer here https://softwaremill.community/t/how-to-get-access-to-the-request-payload-in-tapir-ziohttp-defaultserverlog/84/3.
Adam Warski:
"This is usually problematic as the body of the request is a stream of bytes, which is read from the socket as it arrives. That is, the request isn’t loaded into memory by default.
You can work-around this by reading the whole request into memory using serverRequest.underlying.asInstanceOf[zio.http.Request].body.asArray, extracting the required info and enriching the fiber-locals appropriately. You might also need to substitute the Request with a copy, which has the body provided as a byte array (in a “strict” form), so that the “proper” body parser doesn’t try to re-read from the network (where nothing will be available).
However, this has some downsides: the body will be parsed twice (once by your interceptor, once by the parsing that’s defined later); and it will be read into memory (which might be problematic if you don’t have a limit on the size of the body)."

How to serialize/deserialize objects sent over the network in Haskell?

I see that there are many ways to serialize/deserialize Haskell objects:
Data.Serialize -> encode, decode functions
Data.Binary http://code.haskell.org/binary/
MsgPack, JSON, BSON, etc
In my application, I want to setup a simple TCP client-server, where client may send serialized Haskell record objects. How does one decide between these serialization alternatives?
Additionally, when objects serialized into strings are sent over the network using Network.Socket, strings are returned. Is there a slightly higher level library, that works at the level of whole TCP messages? In other words, is there a way to avoid writing parsing code on the receive end that:
collects results of a sequence of recv() calls,
detect that a whole object has been received, and
then parse it into a haskell type?
In my application, the objects are not expected to be too large (maybe about ~1MB max).
As for the second part of your question, two things are required:
An incremental parser that doesn't need to have the whole document in memory to start parsing, and which can be fed with the partial chunks of data arriving from the wire. Also, when the parsing succeeds it must return any "leftover data" along with the parsed value.
A source of data with "pushback capabilities", that allows you to "unread" any leftovers so that they are available to the next parsing attempt.
The most popular library providing (1) is attoparsec. As for (2), all the three main streaming libraries (conduit, io-streams, and pipes) offer some kind of pushback functionality (the latter using the auxiliary pipes-parse package). All three libraries can integrate with attoparsec parsers as well (see here, here and here).
(Another option, of course, is to prepend each message with its lenght are read only the exact number of bytes.)
To answer the first part of your question (about data serialization), I would say that everything you listed sounds fine. Since you are dealing with pretty big (1MB) serializations, I think that the most important thing is laziness. There is another serialization library, called cereal that has strict serializations, and you wouldn't want that because you'd need to build it up in memory before sending in out. I'll give a shout out to aeson (http://hackage.haskell.org/package/aeson-0.8.0.2/docs/Data-Aeson.html) which you can use GHC Generics with to get something simple like this:
data Shape = Rect Int Int | Circle Double | Other String Int
deriving (Generic)
instance FromJSON Shape -- uses a default
instance ToJSON Shape -- uses a default
And then, bam!, you've got access to the encode and decode methods. I don't know about a higher level TCP library. Hopefully, someone else will have more insight on that.

Non-RESTful backend with backbone.js

I'm evaluating backbone.js as a potential javascript library for use in an application which will have a few different backends: WebSocket, REST, and 3rd party library producing JSON. I've read some opinions that backbone.js works beautifully with RESTful backends so long as the api is 'by the book' and follows the appropriate http verbage. Can someone elaborate on what this means?
Also, how much trouble is it to get backbone.js to connect to WebSockets? Lastly, are there any issues with integrating a backbone.js model with a function which returns JSON - in other words does the data model always need to be served via REST?
Backbone's power is that it has an incredibly flexible and modular structure. It means that any part of Backbone you can use, extend, take out, or modify. This includes the AJAX functionality.
Backbone doesn't "care" where do you get the data for your collections or models. It will help you out by providing an out of the box RESTful "ajax" solution, but it won't be mad if you want to use something else!
This allows you to find (or write) any plugin you want to handle the server interaction. Just look on backplug.io, Google, and Github.
Specifically for Sockets there is backbone.iobind.
Can't find a plugin, no worries. I can tell you exactly how to write one (it's 100x easier than it sounds).
The first thing that you need to understand is that overwriting behavior is SUPER easy. There are 2 main ways:
Globally:
Backbone.Collection.prototype.sync = function() {
//screw you Backbone!!! You're completely useless I am doing my own thing
}
Per instance
var MySpecialCollection = Backbone.Collection.extend({
sync: function() {
//I like what you're doing with the ajax thing... Clever clever ;)
// But for a few collections I wanna do it my way. That cool?
});
And the only other thing you need to know is what happens when you call "fetch" on a collection. This is the "by the book"/"out of the box behavior" behavior:
collection#fetch is triggered by user (YOU). fetch will delegate the ACTUAL fetching (ajax, sockets, local storage, or even a function that instantly returns json) to some other function (collection#sync). Whatever function is in collection.sync has to has to take 3 arguments:
action: create (for creating), action: read (for fetching), delete (for deleting), or update (for updating) = CRUD.
context (this variable) - if you don't know what this does it, don't worry about it, not important for now
options - where da magic is. We only care about 1 option though
success: a callback that gets called when the data is "ready". THIS is the callback that collection#fetch is interested in because that's when it takes over and does it's thing. The only requirements is that sync passes it the following 1st argument
response: the actual data it got back
Now
has to return a success callback in it's options that gets executed when it's done getting the data. That function what it's responsible for is
Whenever collection#sync is done doing it's thing, collection#fetch takes back over (with that callback in passed in to success) and does the following nifty steps:
Calls set or reset (for these purposes they're roughly the same).
When set finishes, it triggers a sync event on the collection broadcasting to the world "yo I'm ready!!"
So what happens in set. Well bunch of stuff (deduping, parsing, sorting, parsing, removing, creating models, propagating changesand general maintenance). Don't worry about it. It works ;) What you need to worry about is how you can hook in to different parts of this process. The only two you should worry about (if your wraps data in weird ways) are
collection#parse for parsing a collection. Should accept raw JSON (or whatever format) that comes from the server/ajax/websocket/function/worker/whoknowwhat and turn it into an ARRAY of objects. Takes in for 1st argument resp (the JSON) and should spit out a mutated response for return. Easy peasy.
model#parse. Same as collection but it takes in the raw objects (i.e. imagine you iterate over the output of collection#parse) and splits out an "unwrapped" object.
Get off your computer and go to the beach because you finished your work in 1/100th the time you thought it would take.
That's all you need to know in order to implement whatever server system you want in place of the vanilla "ajax requests".

Call a Server-side Method on a Resource in a RESTful Way

Keep in mind I have a rudimentary understanding of REST. Let's say I have this URL:
http://api.animals.com/v1/dogs/1/
And now, I want to make the server make the dog bark. Only the server knows how to do this. Let's say I want to have it run on a CRON job that makes the dog bark every 10 minutes for the rest of eternity. What does that call look like? I kind of want to do this:
URL request:
ACTION http://api.animals.com/v1/dogs/1/
In the request body:
{"action":"bark"}
Before you get mad at me for making up my own HTTP method, help me out and give me a better idea on how I should invoke a server-side method in a RESTful way. :)
EDIT FOR CLARIFICATION
Some more clarification around what the "bark" method does. Here are some options that may result in differently structured API calls:
bark just sends an email to dog.email and records nothing.
bark sends an email to dog.email and the increments dog.barkCount by 1.
bark creates a new "bark" record with bark.timestamp recording when the bark occured. It also increments dog.barkCount by 1.
bark runs a system command to pull the latest version of the dog code down from Github. It then sends a text message to dog.owner telling them that the new dog code is in production.
Why aim for a RESTful design?
The RESTful principles bring the features that make web sites easy (for a random human user to "surf" them) to the web services API design, so they are easy for a programmer to use. REST isn't good because it's REST, it's good because it's good. And it is good mostly because it is simple.
The simplicity of plain HTTP (without SOAP envelopes and single-URI overloaded POST services), what some may call "lack of features", is actually its greatest strength. Right off the bat, HTTP asks you to have addressability and statelessness: the two basic design decisions that keep HTTP scalable up to today's mega-sites (and mega-services).
But REST is not the silver bulltet: Sometimes an RPC-style ("Remote Procedure Call" - such as SOAP) may be appropriate, and sometimes other needs take precedence over the virtues of the Web. This is fine. What we don't really like is needless complexity. Too often a programmer or a company brings in RPC-style Services for a job that plain old HTTP could handle just fine. The effect is that HTTP is reduced to a transport protocol for an enormous XML payload that explains what's "really" going on (not the URI or the HTTP method give a clue about it). The resulting service is far too complex, impossible to debug, and won't work unless your clients have the exact setup as the developer intended.
Same way a Java/C# code can be not object-oriented, just using HTTP does not make a design RESTful. One may be caught up in the rush of thinking about their services in terms of actions and remote methods that should be called. No wonder this will mostly end up in a RPC-Style service (or a REST-RPC-hybrid). The first step is to think differently. A RESTful design can be achieved in many ways, one way is to think of your application in terms of resources, not actions:
💡 Instead of thinking in terms of actions it can perform ("do a search for places on the map")...
...try to think in terms of the results of those actions ("the list of places on the map matching a search criteria").
I'll go for examples below.
(Other key aspect of REST is the use of HATEOAS - I don't brush it here, but I talk about it quickly at another post.)
Issues of the first design
Let's take a look a the proposed design:
ACTION http://api.animals.com/v1/dogs/1/
First off, we should not consider creating a new HTTP verb (ACTION). Generally speaking, this is undesirable for several reasons:
(1) Given only the service URI, how will a "random" programmer know the ACTION verb exists?
(2) if the programmer knows it exists, how will he know its semantics? What does that verb mean?
(3) what properties (safety, idempotence) should one expect that verb to have?
(4) what if the programmer has a very simple client that only handles standard HTTP verbs?
(5) ...
Now let's consider using POST (I'll discuss why below, just take my word for it now):
POST /v1/dogs/1/ HTTP/1.1
Host: api.animals.com
{"action":"bark"}
This could be OK... but only if:
{"action":"bark"} was a document; and
/v1/dogs/1/ was a "document processor" (factory-like) URI. A "document processor" is a URI that you'd just "throw things at" and "forget" about them - the processor may redirect you to a newly created resource after the "throwing". E.g. the URI for posting messages at a message broker service, which, after the posting would redirect you to a URI that shows the status of the message's processing.
I don't know much about your system, but I'd already bet both aren't true:
{"action":"bark"} is not a document, it actually is the method you are trying to ninja-sneak into the service; and
the /v1/dogs/1/ URI represents a "dog" resource (probably the dog with id==1) and not a document processor.
So all we know now is that the design above is not so RESTful, but what is that exactly? What is so bad about it? Basically, it is bad because that is complex URI with complex meanings. You can't infer anything from it. How would a programmer know a dog have a bark action that can be secretly infused with a POST into it?
Designing your question's API calls
So let's cut to the chase and try to design those barks RESTfully by thinking in terms of resources. Allow me to quote the Restful Web Services book:
A POST request is an attempt to create a new resource from an existing
one. The existing resource may be the parent of the new one in a
data-structure sense, the way the root of a tree is the parent of all
its leaf nodes. Or the existing resource may be a special "factory"
resource whose only purpose is to generate other resources. The
representation sent along with a POST request describes the initial
state of the new resource. As with PUT, a POST request doesn’t need to
include a representation at all.
Following the description above we can see that bark can be modeled as a subresource of a dog (since a bark is contained within a dog, that is, a bark is "barked" by a dog).
From that reasoning we already got:
The method is POST
The resource is /barks, subresource of dog: /v1/dogs/1/barks, representing a bark "factory". That URI is unique for each dog (since it is under /v1/dogs/{id}).
Now each case of your list has a specific behavior.
##1. bark just sends an e-mail to dog.email and records nothing.
Firstly, is barking (sending an e-mail) a synchronous or an asynchronous task? Secondly does the bark request require any document (the e-mail, maybe) or is it empty?
1.1 bark sends an e-mail to dog.email and records nothing (as a synchronous task)
This case is simple. A call to the barks factory resource yields a bark (an e-mail sent) right away and the response (if OK or not) is given right away:
POST /v1/dogs/1/barks HTTP/1.1
Host: api.animals.com
Authorization: Basic mAUhhuE08u724bh249a2xaP=
(entity-body is empty - or, if you require a **document**, place it here)
200 OK
As it records (changes) nothing, 200 OK is enough. It shows that everything went as expected.
1.2 bark sends an e-mail to dog.email and records nothing (as an asynchronous task)
In this case, the client must have a way to track the bark task. The bark task then should be a resource with its own URI.:
POST /v1/dogs/1/barks HTTP/1.1
Host: api.animals.com
Authorization: Basic mAUhhuE08u724bh249a2xaP=
{document body, if needed;
NOTE: when possible, the response SHOULD contain a short hypertext note with a hyperlink
to the newly created resource (bark) URI, the same returned in the Location header
(also notice that, for the 202 status code, the Location header meaning is not
standardized, thus the importance of a hipertext/hyperlink response)}
202 Accepted
Location: http://api.animals.com/v1/dogs/1/barks/a65h44
This way, each bark is traceable. The client can then issue a GET to the bark URI to know it's current state. Maybe even use a DELETE to cancel it.
2. bark sends an e-mail to dog.email and then increments dog.barkCount by 1
This one can be trickier, if you want to let the client know the dog resource gets changed:
POST /v1/dogs/1/barks HTTP/1.1
Host: api.animals.com
Authorization: Basic mAUhhuE08u724bh249a2xaP=
{document body, if needed; when possible, containing a hipertext/hyperlink with the address
in the Location header -- says the standard}
303 See Other
Location: http://api.animals.com/v1/dogs/1
In this case, the location header's intent is to let the client know he should take a look at dog. From the HTTP RFC about 303:
This method exists primarily to allow the output of a
POST-activated script to redirect the user agent to a selected resource.
If the task is asynchronous, a bark subresource is needed just like the 1.2 situation and the 303 should be returned at a GET .../barks/Y when the task is complete.
3. bark creates a new "bark" record with bark.timestamp recording when the bark occured. It also increments dog.barkCount by 1.
POST /v1/dogs/1/barks HTTP/1.1
Host: api.animals.com
Authorization: Basic mAUhhuE08u724bh249a2xaP=
(document body, if needed)
201 Created
Location: http://api.animals.com/v1/dogs/1/barks/a65h44
In here, the bark is a created due to the request, so the status 201 Created is applied.
If the creation is asynchronous, a 202 Accepted is required (as the HTTP RFC says) instead.
The timestamp saved is a part of bark resource and can be retrieved with a GET to it. The updated dog can be "documented" in that GET dogs/X/barks/Y as well.
4. bark runs a system command to pull the latest version of the dog code down from Github. It then sends a text message to dog.owner telling them that the new dog code is in production.
The wording of this one is complicated, but it pretty much is a simple asynchronous task:
POST /v1/dogs/1/barks HTTP/1.1
Host: api.animals.com
Authorization: Basic mAUhhuE08u724bh249a2xaP=
(document body, if needed)
202 Accepted
Location: http://api.animals.com/v1/dogs/1/barks/a65h44
The client then would issue GETs to /v1/dogs/1/barks/a65h44 to know the current state (if the code was pulled, it the e-mail was sent to the owner and such). Whenever the dog changes, a 303 is appliable.
Wrapping up
Quoting Roy Fielding:
The only thing REST requires of methods is that they be uniformly
defined for all resources (i.e., so that intermediaries don’t have to
know the resource type in order to understand the meaning of the
request).
In the above examples, POST is uniformly designed. It will make the dog "bark". That is not safe (meaning bark has effects on the resources), nor idempotent (each request yields a new bark), which fits the POST verb well.
A programmer would know: a POST to barks yields a bark. The response status codes (also with entity-body and headers when necessary) do the job of explaining what changed and how the client can and should proceed.
Note: The primary sources used were: "Restful Web Services" book, the HTTP RFC and Roy Fielding's blog.
Edit:
The question and thus the answer have changed quite a bit since they were first created. The original question asked about the design of a URI like:
ACTION http://api.animals.com/v1/dogs/1/?action=bark
Below is the explanation of why it is not a good choice:
How clients tell the server WHAT TO DO with the data is the method information.
RESTful web services convey method information in the HTTP method.
Typical RPC-Style and SOAP services keep theirs in the entity-body and HTTP header.
WHICH PART of the data [the client wants the server] to operate on is the scoping information.
RESTful services use the URI. SOAP/RPC-Style services once again use the entity-body and HTTP headers.
As an example, take Google's URI http://www.google.com/search?q=DOG. There, the method information is GET and the scoping information is /search?q=DOG.
Long story short:
In RESTful architectures, the method information goes into the HTTP method.
In Resource-Oriented Architectures, the scoping information goes into the URI.
And the rule of thumb:
If the HTTP method doesn’t match the method information, the service isn’t RESTful. If the scoping information isn’t in the URI, the service isn’t resource-oriented.
You can put the "bark" "action" in the URL (or in the entity-body) and use POST. No problem there, it works, and may be the simplest way to do it, but this isn't RESTful.
To keep your service really RESTful, you may have to take a step back and think about what you really want to do here (what effects will it have on the resources).
I can't talk about your specific business needs, but let me give you an example: Consider a RESTful ordering service where orders are at URIs like example.com/order/123.
Now say we want to cancel an order, how are we gonna do it? One may be tempted to think that is a "cancellation" "action" and design it as POST example.com/order/123?do=cancel.
That is not RESTful, as we talked above. Instead, we might PUT a new representation of the order with a canceled element sent to true:
PUT /order/123 HTTP/1.1
Content-Type: application/xml
<order id="123">
<customer id="89987">...</customer>
<canceled>true</canceled>
...
</order>
And that's it. If the order can't be canceled, a specific status code can be returned. (A subresource design, like POST /order/123/canceled with the entity-body true may, for simplicity, also be available.)
In your specific scenario, you may try something similar. That way, while a dog is barking, for example, a GET at /v1/dogs/1/ could include that information (e.g. <barking>true</barking>). Or... if that's too complicated, loosen up your RESTful requirement and stick with POST.
Update:
I don't want to make the answer too big, but it takes a while to get the hang of exposing an algorithm (an action) as a set of resources. Instead of thinking in terms of actions ("do a search for places on the map"), one needs to think in terms of the results of that action ("the list of places on the map matching a search
criteria").
You may find yourself coming back to this step if you find that your design doesn't fit HTTP's uniform interface.
Query variables are scoping information, but do not denote new resources (/post?lang=en is clearly the same resource as /post?lang=jp, just a different representation). Rather, they are used to convey client state (like ?page=10, so that state is not kept in the server; ?lang=en is also an example here) or input parameters to algorithmic resources (/search?q=dogs, /dogs?code=1). Again, not distinct resources.
HTTP verbs' (methods) properties:
Another clear point that shows ?action=something in the URI is not RESTful, are the properties of HTTP verbs:
GET and HEAD are safe (and idempotent);
PUT and DELETE are idempotent only;
POST is neither.
Safety: A GET or HEAD request is a request to read some data, not a request to change any server state. The client can make a GET or HEAD request 10 times and it's the same as making it once, or never making it at all.
Idempotence: An idempotent operation in one that has the same effect whether you apply it once or more than once (in math, multiplying by zero is idempotent). If you DELETE a resource once, deleting again will have the same effect (the resource is GONE already).
POST is neither safe nor idempotent. Making two identical POST requests to a 'factory' resource will probably result in two subordinate resources containing the same
information. With overloaded (method in URI or entity-body) POST, all bets are off.
Both these properties were important to the success of the HTTP protocol (over unreliable networks!): how many times have you updated (GET) the page without waiting until it is fully loaded?
Creating an action and placing it in the URL clearly breaks the HTTP methods' contract. Once again, the technology allows you, you can do it, but that is not RESTful design.
I answered earlier, but this answer contradicts my old answer and follows a much different strategy for coming to a solution. It shows how the HTTP request is built from the concepts that define REST and HTTP. It also uses PATCH instead of POST or PUT.
It goes through the REST constraints, then the components of HTTP, then a possible solution.
REST
REST is a set of constraints intended to be applied to a distributed hypermedia system in order to make it scalable. Even to make sense of it in the context of remotely controlling an action, you have to think of remotely controlling an action as a part of a distributed hypermedia system -- a part of a system for discovering, viewing, and modifying interconnected information. If that's more trouble than it's worth, then it's probably no good to try to make it RESTful. If you just want a "control panel" type GUI on the client that can trigger actions on the server via port 80, then you probably want a simple RPC interface like JSON-RPC via HTTP requests/responses or a WebSocket.
But REST is a fascinating way of thinking and the example in the question happens to be easy to model with a RESTful interface, so let's take on the challenge for fun and for education.
REST is defined by four interface constraints:
identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.
You ask how you can define an interface, meeting these constraints, via which one computer tells another computer to make a dog bark. Specifically, you want your interface to be HTTP, and you don't want to discard the features that make HTTP RESTful when used as intended.
Let's begin with the first constraint: resource identification.
Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on.
So a dog is a resource. It needs to be identified.
More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers.
You model a dog by taking a set of identifiers and representations and saying they are all associated with each other at a given time. For now, let's use the identifier "dog #1". That brings us to the second and third constraints: resource representation and self-description.
REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes.
Following is a sequence of bytes capturing the intended state of the dog, i.e. the representation we wish to be associated with the identifier "dog #1" (note that it only represents part of the state as it does not regard the dog's name, health, or even past barks):
It has been barking every 10 minutes since the time this state change was effected, and will continue indefinitely.
It is supposed to be attached to metadata that describes it. This metadata might be useful:
It is an English statement. It describes part of the intended state. If it is received multiple times, only allow the first to have an effect.
Finally, let's look at the fourth constraint: HATEOAS.
REST ... views an application as a cohesive structure of information and control alternatives through which a user can perform a desired task. For example, looking-up a word in an on-line dictionary is one application, as is touring through a virtual museum, or reviewing a set of class notes to study for an exam. ... The next control state of an application resides in the representation of the first requested resource, so obtaining that first representation is a priority. ... The model application is therefore an engine that moves from one state to the next by examining and choosing from among the alternative state transitions in the current set of representations.
In a RESTful interface, the client receives a resource representation in order to figure out how it should receive or send a representation. There must be a representation somewhere in the application from which the client can figure out how to receive or send all representations it should be able to receive or send, even if it follows a chain of representations to arrive at that information. This seems simple enough:
The client asks for a representation of a resource identified as the homepage; in response, it gets a representation that contains an identifier of every dog the client might want. The client extracts an identifier from it and asks the service how it can interact with the identified dog, and the service says the client can send an English statement describing part of the intended state of the dog. Then the client sends such a statement and receives a success message or an error message.
HTTP
HTTP implements REST constraints as follows:
resource identification: URI
resource representation: entity-body
self-description: method or status code, headers, and possibly parts of the entity-body (e.g. the URI of an XML schema)
HATEOAS: hyperlinks
You've decided on http://api.animals.com/v1/dogs/1 as the URI. Let's assume the client got this from some page on the site.
Let's use this entity-body (the value of next is a timestamp; a value of 0 means 'when this request is received'):
{"barks": {"next": 0, "frequency": 10}}
Now we need a method. PATCH fits the "part of the intended state" description we decided on:
The PATCH method requests that a set of changes described in the request entity be applied to the resource identified by the Request-URI.
And some headers:
To indicate the language of the entity-body: Content-Type: application/json
To make sure it only happens once: If-Unmodified-Since: <date/time this was first sent>
And we have a request:
PATCH /v1/dogs/1/ HTTP/1.1
Host: api.animals.com
Content-Type: application/json
If-Unmodified-Since: <date/time this was first sent>
[other headers]
{"barks": {"next": 0, "frequency": 10}}
On success, the client should receive a 204 status code in response, or a 205 if the representation of /v1/dogs/1/ has changed to reflect the new barking schedule.
On failure, it should receive a 403 and a helpful message why.
It is not essential to REST for the service to reflect the bark schedule in a representation in response to GET /v1/dogs/1/, but it would make the most sense if a JSON representation included this:
"barks": {
"previous": [x_1, x_2, ..., x_n],
"next": x_n,
"frequency": 10
}
Treat the cron job as an implementation detail that the server hides from the interface. That's the beauty of a generic interface. The client doesn't have to know what the server does behind the scenes; all it cares about is that the service understands and responds to requested state changes.
Most people use POST for this purpose. It is appropriate for performing "any unsafe or nonidempotent operation when no other HTTP method seems appropriate".
APIs such as XMLRPC use POST to trigger actions that can run arbitrary code. The "action" is included in the POST data:
POST /RPC2 HTTP/1.0
User-Agent: Frontier/5.1.2 (WinNT)
Host: betty.userland.com
Content-Type: text/xml
Content-length: 181
<?xml version="1.0"?>
<methodCall>
<methodName>examples.getStateName</methodName>
<params>
<param>
<value><i4>41</i4></value>
</param>
</params>
</methodCall>
The RPC is example is given to show that POST is the conventional choice of HTTP verbs for server-side methods. Here's Roy Fielding thoughts on POST -- he pretty much says it is RESTful to use the HTTP methods as specified.
Note that RPC itself isn't very RESTful because it isn't resource oriented. But if you need statelessness, caching, or layering, it's not hard to make the appropriate transformations. See http://blog.perfectapi.com/2012/opinionated-rpc-apis-vs-restful-apis/ for an example.
POST is the HTTP method designed for
Providing a block of data...to a data-handling process
Server-side methods handling non-CRUD-mapped actions is what Roy Fielding intended with REST, so you're good there, and that's why POST was made to be non-idempotent. POST will handle most posting of data to server-side methods to process the information.
That said, in your dog-barking scenario, if you want a server-side bark to be performed every 10 minutes, but for some reason need the trigger to originate from a client, PUT would serve the purpose better, because of its idempotence. Well, strictly by this scenario there's no apparent risk of multiple POST requests causing your dog to meow instead, but anyway that's the purpose of the two similar methods. My answer to a similar SO question may be useful for you.
If we assume Barking is an inner / dependent / sub resource which the consumer can act on, then we could say:
POST http://api.animals.com/v1/dogs/1/bark
dog number 1 barks
GET http://api.animals.com/v1/dogs/1/bark
returns the last bark timestamp
DELETE http://api.animals.com/v1/dogs/1/bark
doesn't apply! so ignore it.
Earlier revisions of some answers suggested you use RPC. You do not need to look to RPC as it is perfectly possible to do what you want whilst adhering to the REST constraints.
Firstly, don't put action parameters in the URL. The URL defines what you are applying the action to, and query parameters are part of the URL. It should be thought of entirely as a noun. http://api.animals.com/v1/dogs/1/?action=bark is a different resource — a different noun — to http://api.animals.com/v1/dogs/1/. [n.b. Asker has removed the ?action=bark URI from the question.] For example, compare http://api.animals.com/v1/dogs/?id=1 to http://api.animals.com/v1/dogs/?id=2. Different resources, distinguished only by query string. So the action of your request, unless it corresponds directly to a bodyless existing method type (TRACE, OPTIONS, HEAD, GET, DELETE, etc) must be defined in the request body.
Next, decide if the action is "idempotent", meaning that it can be repeated without adverse effect (see next paragraph for more explanaton). For example, setting a value to true can be repeated if the client is unsure that the desired effect happened. They send the request again and the value remains true. Adding 1 to a number is not idempotent. If the client sends the Add1 command, isn't sure it worked, and sends it again, did the server add one or two? Once you have determined that, you're in a better position to choose between PUT and POST for your method.
Idempotent means a request can be repeated without changing the outcome. These effects do not include logging and other such server admin activity. Using your first and second examples, sending two emails to the same person does result in a different state than sending one email (the recipient has two in their inbox, which they might consider to be spam), so I would definitely use POST for that. If the barkCount in example 2 is intended to be seen by a user of your API or affects something that is client-visible, then it is also something that would make the request non-idempotent. If it is only to be viewed by you then it counts as server logging and should be ignored when determing idempotentcy.
Lastly, determine if the action you want to perform can be expected to succeed immediately or not. BarkDog is a quickly completing action. RunMarathon is not. If your action is slow, consider returning a 202 Accepted, with a URL in the response body for a user to poll to see if the action is complete. Alternatively, have users POST to a list URL like /marathons-in-progress/ and then when the action is done, redirect them from the in progress ID URL to the /marathons-complete/ URL.
For the specific cases #1 and #2, I would have the server host a queue, and the client post batches of addresses to it. The action would not be SendEmails, but something like AddToDispatchQueue. The server can then poll the queue to see if there are any email addresses waiting, and send emails if it finds any. It then updates the queue to indicate that the pending action has now been performed. You would have another URI showing the client the current state of the queue. To avoid double-sending of emails, the server could also keep a log of who it has sent this email to, and check each address against that to ensure it never sends two to the same address, even if you POST the same list twice to the queue.
When choosing a URI for anything, try to think of it as a result, not an action. For example google.com/search?q=dogs shows the results of a search for the word "dogs". It does not necessarilly perform the search.
Cases #3 and #4 from your list are also not idempotent actions. You suggest that the different suggested effects might affect the API design. In all four cases I would use the same API, as all four change the “world state.”
See my new answer -- it contradicts this one and explains REST and HTTP more clearly and accurately.
Here's a recommendation that happens to be RESTful but is certainly not the only option. To start barking when the service receives the request:
POST /v1/dogs/1/bark-schedule HTTP/1.1
...
{"token": 12345, "next": 0, "frequency": 10}
token is an arbitrary number that prevents redundant barks no matter how many times this request is sent.
next indicates the time of the next bark; a value of 0 means 'ASAP'.
Whenever you GET /v1/dogs/1/bark-schedule, you should get something like this, where t is the time of the last bark and u is t + 10 minutes:
{"last": t, "next": u}
I highly recommend that you use the same URL to request a bark that you use to find out about the dog's current barking state. It's not essential to REST, but it emphasizes the act of modifying the schedule.
The appropriate status code is probably 205. I'm imagining a client that looks at the current schedule, POSTs to the same URL to change it, and is instructed by the service to give the schedule a second look to prove that it has been changed.
Explanation
REST
Forget about HTTP for a moment. It's essential to understand that a resource is a function that takes time as input and returns a set containing identifiers and representations. Let's simplify that to: a resource is a set R of identifiers and representations; R can change -- members can be added, removed, or modified. (Though it's bad, unstable design to remove or modify identifiers.) We say an identifier that is an element of R identifies R, and that a representation that is an element of R represents R.
Let's say R is a dog. You happen to identify R as /v1/dogs/1. (Meaning /v1/dogs/1 is a member of R.) That's just one of many ways you could identify R. You could also identify R as /v1/dogs/1/x-rays and as /v1/rufus.
How do you represent R? Maybe with a photograph. Maybe with a set of X-rays. Or maybe with an indication of the date and time when R last barked. But remember that these are all representations of the same resource. /v1/dogs/1/x-rays is an identifier of the same resource that is represented by an answer to the question "when did R last bark?"
HTTP
Multiple representations of a resource aren't very useful if you can't refer to the one you want. That's why HTTP is useful: it lets you connect identifiers to representations. That is, it is a way for the service to receive a URL and decide which representation to serve to the client.
At least, that's what GET does. PUT is basically the inverse of GET: you PUT a representation r at the URL if you wish for future GET requests to that URL to return r, with some possible translations like JSON to HTML.
POST is a looser way of modifying a representation. Think of there being display logic and modification logic that are counterparts to each other -- both corresponding to the same URL. A POST request is a request for the modification logic to process the information and modify any representations (not just the representation located by the same URL) as the service sees fit. Pay attention to the third paragraph after 9.6 PUT: you're not replacing the thing at the URL with new content; you're asking the thing at the URL to process some information and intelligently respond in the form of informative representations.
In our case, we ask the modification logic at /v1/dogs/1/bark-schedule (which is the counterpart to the display logic that tells us when it last barked and when it will next bark) to process our information and modify some representations accordingly. In response to future GETs, the display logic corresponding to the same URL will tell us that the dog is now barking as we wish.
Think of the cron job as an implementation detail. HTTP deals in viewing and modifying representations. From now on, the service will tell the client when the dog last barked and when it will bark next. From the service's perspective, that is honest because those times correspond with past and planned cron jobs.
REST is a resource oriented standard, it is not action driven as a RPC would be.
If you want your server to bark, you should look into different ideas like JSON-RPC, or into websockets communication.
Every try to keep it RESTful will fail in my opinion: you can issue a POST with the action parameter, you are not creating any new resources but as you may have side effects, you are safer.

GWT RPC server side safe deserialization to check field size

Suppose I send objects of the following type from GWT client to server through RPC. The objects get stored to a database.
public class MyData2Server implements Serializable
{
private String myDataStr;
public String getMyDataStr() { return myDataStr; }
public void setMyDataStr(String newVal) { myDataStr = newVal; }
}
On the client side, I constrain the field myDataStr to be say 20 character max.
I have been reading on web-application security. If I learned something it is client data should not be trusted. Server should then check the data. So I feel like I ought to check on the server that my field is indeed not longer than 20 characters otherwise I would abort the request since I know it must be an attack attempt (assuming no bug on the client side of course).
So my questions are:
How important is it to actually check on the server side my field is not longer than 20 characters? I mean what are the chances/risks of an attack and how bad could the consequences be? From what I have read, it looks like it could go as far as bringing the server down through overflow and denial of service, but not being a security expert, I could be mis-interpreting.
Assuming I would not be wasting my time doing the field-size check on the server, how should one accomplish it? I seem to recall reading (sorry I no longer have the reference) that a naive check like
if (myData2ServerObject.getMyDataStr().length() > 20) throw new MyException();
is not the right way. Instead one would need to define (or override?) the method readObject(), something like in here. If so, again how should one do it within the context of an RPC call?
Thank you in advance.
How important is it to actually check on the server side my field is not longer than 20 characters?
It's 100% important, except maybe if you can trust the end-user 100% (e. g. some internal apps).
I mean what are the chances
Generally: Increasing. The exact proability can only be answered for your concrete scenario individually (i. e. no one here will be able to tell you, though I would also be interested in general statistics). What I can say is, that tampering is trivially easy. It can be done in the JavaScript code (e. g. using Chrome's built-in dev tools debugger) or by editing the clearly visible HTTP request data.
/risks of an attack and how bad could the consequences be?
The risks can vary. The most direct risk can be evaluated by thinking: "What could you store and do, if you can set any field of any GWT-serializable object to any value?" This is not only about exceeding the size, but maybe tampering with the user ID etc.
From what I have read, it looks like it could go as far as bringing the server down through overflow and denial of service, but not being a security expert, I could be mis-interpreting.
This is yet another level to deal with, and cannot be addressed with server side validation within the GWT RPC method implementation.
Instead one would need to define (or override?) the method readObject(), something like in here.
I don't think that's a good approach. It tries to accomplish two things, but can do neither of them very well. There are two kinds of checks on the server side that must be done:
On a low level, when the bytes come in (before they are converted by RemoteServiceServlet to a Java Object). This needs to be dealt with on every server, not only with GWT, and would need to be answered in a separate question (the answer could simply be a server setting for the maximum request size).
On a logical level, after you have the data in the Java Object. For this, I would recommend a validation/authorization layer. One of the awesome features of GWT is, that you can use JSR 303 validation both on the server and client side now. It doesn't cover every aspect (you would still have to test for user permissions), but it can cover your "#Size(max = 20)" use case.