REST API versioning when using Atom for resource collections - rest

I know this is something that has been discussed over and over, and I have done extensive research to get where I am so far, but can't seem to get over the final hurdle.
I am designing a custom REST api for our application, and have decided that I would like to version using media types e.g. application/vnd.mycompany.resource.v2+xml. I realise the pro's and con's of this model and it seems to weigh up the most flexible.
Hence my GET would look as follows:
=== REQUEST ===>
GET /workspaces/123/contacts?firstName=Neil&accessID=789264&timestamp=1317611 HTTP/1.1
Accept: application/vnd.mycompany.contact-v2+xml
<== RESPONSE ===
HTTP/1.1 200 OK
Content-Type: application/vnd.mycompany.contact-v2+xml
<contact>
<name>Neil Armstrong</name>
<mobile>+61456838435</mobile>
<email>neil.armstrong#space.com</email>
</contact>
The problem is that I would like to use Atom feeds and entrys to represent my resource collections. This way I can harness the searching and pagination of Atom without this infected my resources or API structure.
If I use Atom for my requests, my request structure now looks like:
=== REQUEST ===>
GET /workspaces/123/contacts HTTP/1.1
Accept: application/atom+xml; type=feed;
<== RESPONSE ===
HTTP/1.1 200 OK
Content-Type: application/atom+xml; type=feed;
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Contacts Feed</title>
<link rel="self" href="https://api.mycompany.com/workspaces/contacts"/>
<updated>2011-11-13T18:30:02Z</updated>
...
<entry>
<title>Neil Armstrong</title>
...
<content type="application/vnd.mycompany.contact-v2+xml">
<contact>
<name>Neil Armstrong</name>
<mobile>+61456838435</mobile>
<email>neil.armstrong#space.com</email>
</contact>
</content>
</entry>
</feed>
Using Atom to represent my collections of resources, I lose the ability to version using media types. As the media type is now hidden within the content of the Atom entry.
<content type="application/vnd.mycompany.contact-v2+xml">
What is the best practice for determining the media type version of my resource, while still utilising the power of Atom for Resource collection management?
My thinking is that I could pass it through the ACCEPT header e.g.
Accept: application/atom+xml; type=feed; version=1.0
But then this is confusing as you are asking for version 1.0 of the Atom feed, not the resource itself...
Any help would be really appreciated!!

The problem is that IMHO you're misusing the media types.
Media types give you information on the STRUCTURE of the actual payload, but not the SEMANTICS of the payload. "I know this is an XHTML page, but I don't know if it's a blog post or a item on Amazon." By being an XHTML page, you know how to get the component parts out of the payload and ask interesting questions, but interpretation of the payload is not part of the media type.
Consider an example, paraphrased from an example of Roy Fielding, sending a 10,000 bit array as a GIF file that's 100x100 pixels. GIF, as "everyone knows" is used for sending pictures, but it's really simpler than that. It's a mechanism for sending structured binary that just-happen to most-of-the-time be images. So, in this case of using it to send a 10,000 bit array (perhaps represented as a gray scale image of 00 and FF), you get the benefit of a common decoder (GIF), GIFs built in compression, etc.
But, in this case, it's not a picture. You can show it as a picture, but it's a meaningless picture. The classic semantic of it being used for the picture scenario is not relevant in this case. The benefit is the ubiquity of the the format.
Another example was years ago an engineer was doing radar studies. So, he would take the 3-view drawings of aircraft you would find in books and such, and he would encode them using a tablet in to AutoCAD drawings. The DWG format was well documented, and he had code to read them. What he wanted was the coordinates and measurements from the specific aircraft.
So, in the end he had a bunch of "meaningless" AutoCAD files with nothing but a bunch of lines in it that "made no sense". But in fact they were chock full of good information for his domain. The DWG file was the media-type, but these weren't "CAD drawings". (Can you say "spontaneous reuse"?)
It's fine to version something via media-type, but that's only relevant if the media-type is in fact changing. ATOM, as you noted, isn't changing, or at least it's not changing under your control, and you may choose not to support the new version if/when it does change. But ATOM is not changing because how it represents its information, how that information is encoded, is not changing. The information may well change, in fact it changes all the time. Every ATOM feed is different with different information. MOST have similar semantics (blog feeds), but many do not (for example, perhaps your scenario).
But how you will parse and get information out of the ATOM feed will not change. And that's what the media type represents. An encoding of information, not the information itself.
So, if you want to detect versioning, then check within your payload. Inspect it. You KNOW that for V1 of your data where, for example, the invoice number is (perhaps it's at invoice/inv_no in XPATH). If the invoice is NOT there, then what do you do? You a) look some other well known place (i.e. V2), or, b) you throw an error ("Whatever this is, it's not an invoice!"). You would have to do that no matter what, because you could be getting anything, regardless of what the version says, or the media type says, or what anything else says.
You can make your payloads forward compatible to be resistant to breaking change, then version is a matter of making use of all the information you can see. If you get A and B, then while you'd like to have C and D as well, the clients can get by with the more limited information. Of if the clients see C and D, they would know to ignore A and B, as that data is deprecated. Same with the server. If something is sending A and B, it's implied to be an older processing model than if they sent along C and D.
You can version through rel names "order" vs "order_2", old clients only know to use "order", new clients know to use "order_2" and follow that link instead.
Or you simply include a version identifier in the payload, that's an easy check as well (especially since it's early in your design).
There are a lot of ways to manage the versioning, but the media type really shouldn't be the mechanism. That's why this really isn't a "problem" with ATOM. So, it's a matter of perspective.
I have another discussion about the Accept header over here: REST API having same object, but light
This (IMHO) unrelated to your versioning issue, but it's an example of extended media-types. But that's only my perception of why and how most folks want "versioning". A case could be made this case is the same thing, but most folks associate versioning with services, not simply data representations, which this other post was mostly about.
In the end, either your client and/or server are flexible enough to handle versioned data or they're not. They will (mostly, they are computers after all. Deterministic my heinie...) do what they're told. A simple rule of "ignore stuff that you don't know" can take you quite far in terms of versioning without ever changing a v1 to a v2, regardless of your encoding. Likewise "work with what you have" is a nice rule for a flexible, tolerant server. If you have problems in either case, that's what errors, logs, operators, and 24hr pagers are for, and you need those anyway.

Related

which type of request is used for the `delete` button in the REST context?

I am creating a REST API for the Order screen. I have methods:
GET /api/orders
GET /api/orders/{orderId}
I have some buttons on the Order page and I created few endpoints for that:
PATCH /api/order/buttons/mark-as-read
PATCH /api/order/buttons/change-status
Now I need to add the delete button. But I don't understand how to do that. I have 2 options:
DELETE /api/orders/{orderId} - but I should send 2 additional parameters in this request
PATCH /api/order/buttons/delete - I can send my DTO in the body, but it is not a REST approach.
I want to understand which type of request is used for the delete button in the REST context?
PATCH /api/order/buttons/mark-as-read
PATCH /api/order/buttons/change-status
These are a bit strange. PATCH is a method with remote authoring semantics; it implies that you are making a change to the resource identified by the effective target URI.
But that doesn't seem to be the case here; if you are expecting to apply the changes to the document identified by /api/orders/{orderId}, then that should be the target URI, not some other resource.
PATCH /api/orders/1
Content-Type: text/plain
Please mark this order as read.
PATCH /api/orders/1
Content-Type: text/plain
Please change the status of this order to FULFILLED
Of course, we don't normally use "text/plain" and statements that require a human being to interpret, but instead use a patch document format (example: application/json-patch+json) that a machine can be taught to interpret.
I want to understand which type of request is used for the delete button in the REST context?
If the semantics of "delete" belong to the Orders domain (for instance, if it is a button that signals a desire to cancel an order) then you should be using PUT or PATCH (if you are communicating by passing updated representations of the resource) or POST (if you are sending instructions that the server will interpret).
The heuristic to consider: how would you do this on a plain HTML page? Presumably you would have a "cancel my order" form, with input controls to collect information from the user, and possibly some hidden fields. When the user submits the form, the browser would use the form data and HTML's form processing rules to create an application/x-www-form-urlencoded representation of the information, and would then POST that information to the resource identified by the form action.
The form action could be anything; you could use /api/orders/1/cancel, analogous to your mark-as-read and change-status design; but if you can use the identifier of the order (which is to say, the resource that you are changing), then you get the advantages of standardized cache invalidation for free.
It's normal for a single message handler, which has a single responsibility in the transfer of documents over a network domain, ex POST /api/orders/{orderId}, to interpret the payload and select one of multiple handlers (change-status, mark-as-read, cancel) in your domain.
you offer to use something like this: PATCH /api/orders/{orderId} and OrderUpdatesDto as JSON string in the request body?
Sort of.
There are three dials here: which effective request URI to use, which payload to use, which method to use.
Because I would want to take advantage of cache invalidation, I'm going to look for designs that use: /api/order/{orderId} as the effective request URI, because that's the URI for the responses that I want to invalidate.
It's fine to use something like a JSON representation of an OrderUpdate message/command/DTO as the payload of the request. But that's not really a good match for remote authoring. So instead of PATCH, I would use POST
POST /api/orders/1 HTTP/1.1
Content-Type: application/prs.pavel-orderupdate+json
{...}
But you can instead decide to support a remote authoring interface, meaning that the client just edits their local copy of /api/order/1 and then tells you what changes they made.
That's the case where both PUT (send back the entire document) and PATCH (send back a bunch of edits) can make sense. If GET /api/orders/1 returns a JSON document, then I'm going to look into whether or not I can support one of the general purpose JSON patch document formats; JSON Patch or JSON Merge Patch or something along those lines.
Of course, it can be really hard to get from "changes to a document" to a message that will be meaningful to a non-anemic domain. There are reasons that we might prefer supporting a task based experience, but sending a task centric DTO is not a good fit for PUT/PATCH if you also want caching to work the way I've described above.

How to create and implement a pixel tracking code

OK, here's a goal I've been looking for a while.
As it's known, most advertising and analytics companies use a so called "pixel" code in order to track websites views, transactions, conversion etc.
I do have a general idea on how it works, the problem is how to implement it. The tracking codes consist from few parts.
The tracking code itself.
This is the code that the users inserts on his webpage in the <head> section. The main goal of this code is to set some customer specific variables and to call the *.js file.
*.js file.
This file holds all the magic of CRUD (create/read/update/delete) cookies, track user's events and interaction with the webpage.
The pixel code.
This is an <img> tag with the src atribute pointing to an image *.gif (for example) file that takes all the parameters collected on the page, and stores them in the database.
Example:
WordPress pixel code: <img id="wpstats" src="http://stats.wordpress.com/g.gif?host=www.hostname.com&list_of_cookies_value_pairs;" alt="">
Google Analitycs:
http://www.google-analytics.com/__utm.gif?utmwv=4&utmn=769876874&etc
Now, it's obvious that the *.gif request has to reach a server side scripting language in order to read the parameters data and store them in a db.
Does anyone have an idea how to implement this in Zend?
UPDATE
Another thing I'm interested in is: How to avoid the user's browser to load the cached *.gif ? Will a random parameter value do the trick? Example: src="pixel.gif?nocache=random_number" where the nocache parameter value will be different on every request.
As Zend is built using PHP, it might be worth reading the following question and answer: Developing a tracking pixel.
In addition to this answer and as you're looking for a way of avoiding caching the tracking image, the easiest way of doing this is to append a unique/random string to it, which is generated at runtime.
For example, server-side and with the creation of each image, you might add a random URL id:
<?php
// Generate random id of min/max length
$rand_id = rand(8, 8);
// Echo the image and append a random string
echo "<img src='pixel.php?a=".$vara."&b=".$varb."&rand=".$rand_id."'>";
?>
Just adding my 2 cents to this thread because I think an important, and frequently used, option is missing: you don't necessarily need a scripting language to capture the request. A more efficient approach is to use the web server access log (like apache access log for instance) to log the request and then handle that log with whatever tools you see fit, like ELK stack for instance.
This makes serving the requests much lighter because no scripting language is loaded to prepare the response, just native apache response, which is typically much more efficient.
First of all, the *.gif doesn't need to be that file type, the only thing that is of interest is the Content-Type http header. Set that to image/gif (or any other, appropiate type) in the beginning, execute your code and render some sort of image to the response body.
Well, all of the above codes are correct and is good but to be certain, the guy above mention "g.gif"
You can just add a simple php code to write to an sql or fwrite("file.txt",$opened)
where var $opened serves as the counter++ if someone opened your mail... then save it as "g.gif"
TO DO all of this just add these:
<Files "/thisdirectory">
AddType application/x-httpd-php .gif
</Files>
to your ".htaccess" file but be sure to make a new directory for that g.gif or whatever.gif where the directory only contains g.gif and .htaccess

How does the email header field 'thread-index' work?

I was wondering if anyone knew how the thread-index field in email headers work?
Here's a simple chain of emails thread indexes that I messaged myself with.
Email 1 Thread-Index: AcqvbpKt7QRrdlwaRBKmERImIT9IDg==
Email 2 Thread-Index: AcqvbpjOf+21hsPgR4qZeVu9O988Eg==
Email 3 Thread-Index: Acqvbp3C811djHLbQ9eTGDmyBL925w==
Email 4 Thread-Index: AcqvbqMuifoc5OztR7ei1BLNqFSVvw==
Email 5 Thread-Index: AcqvbqfdWWuz4UwLS7arQJX7/XeUvg==
I can't seem to say with certainty how I can link these emails together. Normally, I would use the in-reply-to field or references field, but I recently found that Blackberrys do NOT include these fields. The only include Thread-Index field.
They are base64 encoded Conversation Index values. No need to reverse engineer them as they are documented by Microsoft on e.g. http://msdn.microsoft.com/en-us/library/ms528174(v=exchg.10).aspx and more detailed on http://msdn.microsoft.com/en-us/library/ee202481(v=exchg.80).aspx
Seemingly the indexes in your example doesn't represent the same conversation, which probably means that the software that sent the mails wasn't able to link them together.
EDIT: Unfortunately I don't have enough reputation to add a comment, but adamo is right that it contains a timestamp - a somewhat esoteric encoded partial FILETIME. But it also contains a GUID, so it is pretty much guarenteed to be unique for that mail (of course the same mail can exist in multiple copies).
There's a good analysis of how exactly this non-standard "Thread-Index" header appears to be used, in this post and links therefrom, including this pdf (a paper presented at the CEAS 2006 conference) and this follow-up, which includes a comment on the issue from the evolution source code (which seems to reflect substantial reverse-engineering of this undocumented header).
Executive summary: essentially, the author eventually gives up on using this header and recommends and shows a different approach, which is also implemented in the c-client library, part of the UW IMAP Toolkit open source package (which is not for IMAP only -- don't let the name fool you, it also works for POP, NNTP, local mailboxes, &c).
I wouldn't be surprised if there are mail clients out there which would not be able to link Blackberry's mails to their threads. The Thread-Index header appears to be a Microsoft extension.
Either way, Novell Evolution implements this. Take a look at this short description of how they do it, or this piece of code that finds the thread parent of a given message.
I assume that, because the lengths of the Thread-Index headers in your example are all the same, these messages were all thread starts? Strange that they're only 22-bytes, though I suppose you could try applying the 5-bytes-per-message rule to them and see if it works for you.
If you are interested in parsing the Thread-Index in C# please take a look at this post
http://forum.rebex.net/questions/3841/how-to-interprete-thread-index-header
The snippet you will find there will let you parse the Thread-Index and retrieve the Thread GUID and message DateTime. There is a problem however, it does not work for all Thread-Indexes out there. Question is why do some Thread-Indexes generate invalid DateTime and what to do to support all of them???

Is it ok to return application/octet-stream from a REST interface?

Am I breaking any laws in the REST bible by returning application/octet-stream for my responses ? The REST endpoint receives 5 image urls.
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif" }
and it will download these and return them as application/octet-stream.
CLARIFICATION: The client that invokes this REST interface is a mobile app. Every additional network connections made will reduce battery life by a few milliamps. I am forced to use REST because it is a company standard. If not, I will do my own binary protocol.
It is not so good, as the client will not know what to do with such binary data except of storing those bytes somewhere or sending them further to some other process (if this is all you need to do with your data, then it is fine).
You may take a look at multipart content types. IMO, a multipart message containing several image/gif parts would be a better alternative.
From the sounds of this, this sounds much more like an RPC call. Specifically, "here's a list of URLs, send me back an archive".
That process is not particularly RESTful, as REST is not an RPC based system.
What you need to do is treat the archives as reources, and a way to create and then serve them up.
For example you could:
POST /archives
Content-Type: application/json
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif" }
As a result, you would get
HTTP/1.1 201 Created
Location: http://example.com/archives/1234
Content-Type: application/json
Then, you could make a request to http://example.com:
GET /archives/1234
Accept: multipart/mixed
Here, you will get the actual archive in a single request (like you want), only it's a multipart formatted result. (multipart/x-zip would work too, that's a zip file)
If you did:
GET /archives/1234
Accept: application/json
You would get back the JSON you sent originally (so you could, perhaps, edit and update the archive, something you may not want to support sending up the binary images).
To change it you would simply POST back the update:
PUT /archives/1234
Content-Type: application/json
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif",
"image3": "http://www.foo2.foo/4.gif" }
The resource is /archives/1234, that's its name.
It has two representations in this case: the JSON version, and the actual, binary archive. Your service distinguishes between the two using the content type specified in the Accept header. That header is the client telling you what it wants.
When you're done with the archive, simply DELETE it
DELETE /archives/1234
Or you can have the server expire the resource at some later time.
Why not have five separate REST calls?
Seems cleaner and divides more logically. It will also run the downloads in parallel, 2 or more at a time depending on the browser you are using.
They are called REST principles not laws, but no you are not "breaking" them, IMO. REST is about resources being addressable by a URL, and (where appropriate) available in multiple formats. It doesn't say what the format should be. There's a simple description of what REST means in this article.
However, as #Andrey says there are nicer ways to handle sending multiple data objects than inventing your own adhoc format. The Multipart mimeType / format is one alternative, and another is to send the objects packed up as a tar, zip or a similar archive file format.
IMO. the real problem with using "application/octet-stream" and is that it doesn't tell anyone anything about how the data is actually formatted. Rather your client has "know" how it is formatted, and interpret it accordingly. And the problems with inventing your own format are interoperability and (possibly) having to design, implement and maintain libraries to support it, possibly may times over.

RESTful, efficient way to query List.contains(element)?

Given:
/images: list of all images
/images/{imageId}: specific image
/feed/{feedId}: potentially huge list of some images (not all of them)
How would you query if a particular feed contains a particular image without downloading the full list? Put another way, how would you check whether a resource state contains a component without downloading the entire state? The first thought that comes to mind is:
Alias /images/{imageId} to /feed/{feedId}/images/{imageId}
Clients would then issue HTTP GET against /feed/{feedId}/images/{id} to check for its existence. The downside I see with this approach is that it forces me to hard-code logic into the client for breaking down an image URI to its proprietary id, something that REST frowns upon. Ideally I should be using the opaque image URI. Another option is:
Issue HTTP GET against /feed/{feedId}?contains={imageURI} to check for existence
but that feels a lot closer to RPC than I'd like. Any ideas?
What's wrong with this?
HEAD /images/id
It's unclear what "feed" means, but assuming it contains resources, it'd be the same:
HEAD /feed/id
It's tricky to say without seeing some examples to provide context.
But you could just have clients call HEAD /feed/images/{imageURI} (assuming that you might need to encode the imageURI). The server would respond with the usual HEAD response, or with a 404 error if the resource doesn't exist. You'd need to code some logic on the server to understand the imageURI.
Then the client either uses the image meta info in the head, or gracefully handles the 404 error and does something else (depending on the application I guess)
There's nothing "un-RESTful" about:
/feed/{feedId}?contains={imageURI}[,{imageURI}]
It returns the subset as specified. The resource, /feed/{feedid}, is a list resource containing a list of images. How is the resource returned with the contains query any different?
The URI is unique, and returns the appropriate state from the application. Can't say anything about the caching semantics of the request, but they're identical to whatever the caching semantics are of the original /feed/{feedid}, it simply a subset.
Finally, there's nothing that says that there even exists a /feed/{feedid}/image/{imageURL}. If you want to work with the sub-resources at that level, then fine, but you're not required to. The list coming back will likely just be a list of direct image URLS, so where's the link describing the /feed/{feedid}/image/{imageURL} relationship? You were going to embed that in the payload, correct?
How about setting up a ImageQuery resource:
# Create a new query from form data where you could constrain results for a given feed.
# May or may not redirect to /image_queries/query_id.
POST /image_queries/
# Optional - view query results containing URIs to query resources.
GET /image_queries/query_id
This video demonstrates the idea using Rails.