REST service return collection of XML documents - rest

We are writing a REST service to query for PDF files. The service consumer wants the metadata for those PDFs, not the actual PDF. The metadata for happens to be stored as an XML document, one XML document for each PDF resource. They resource and the resource's metadata are completely different files.
What should the query response look like?
Typically we use JSON for request/response bodies. Should the response body be a JSON object that contains a collection of URLs, where each URL links to a metadata document? This seems pretty clean, but causes a lot unnecessary network traffic because the consumer must send a GET request for each metadata document.
Should the XML of the metadata documents be embedded in the response body's JSON object? (yuck!)
Is there a solution is both clean and efficient?

Based on some clarifying comments, I'm going to suggest that you don't write a "RESTful" API. You don't need one. You don't have objects that you need to interact with in any complex way. You don't have state that needs to be affected (REST means Representational State Transfer).
You just need an HTTP API. Just return the XML file. You can also provide an endpoint to get multiple XML documents ZIPed, if you want.
So do something like this:
/api/host/123 - download the PDF file (Content-Type: application/pdf) - You didn't say if you already have an endpoint for PDFs, but if you did want one, this is how I would structure it.
/api/host/123/metadata - download the XML metadata (Content-Type: text/xml)
/api/host/bulk_metadata - download a ZIP of the metadata for file IDs listed in a POST parameter (Content-Type: application/zip)
Use Content-Disposition: attachment; filename="{filename}.{pdf|xml|zip}" to tell browsers to download the content to disk rather than displaying it inline.

Related

How to organise REST endpoints for a drop-wizard application?

I am new to dropwizard and REST.
My sample application is a order viewing system. Currently, I am working on a functionality where the UI page consists of set of order search criteria, search button, and link to download the search result as CSV. Download link is displayed only after the successful search. The application has to write the search result to a CSV file and the file location returned will be used to download the file.
I need help in organising the endpoints for this.
Initially, I thought of an end point GET - /orders - text/JSON with search criteria passed in as query params. But, since I will be actually creating the CSV for every GET request, I am wondering if I am violating the HATEOAS rest constraint for the GET (resource should not be created). Or, since the actual resource is Order and not the CSV, is it ok to have the endpoint as GET?
Or, do I need multiple endpoints adhering to the REST constraints and conventions interacting with each other to produce the required result?
Like:
1.POST - /orders/csv - text/json (file name) : creates the CSV file of orders and returns the JSON of file name.
2.GET /orders/csv/<file_name>: gets the file to download.
Many thanks for your help.

best approach to design a rest web service with binary data to be consumed from the browser

I'm developing a json rest web service that will be consumed from a single web page app built with backbone.js
This API will let the consumer upload files related to some entity, like pdf reports related to a project
Googling around and doing some research at stack overflow I came with these possible approaches:
First approach: base64 encoded data field
POST: /api/projects/234/reports
{
author: 'xxxx',
abstract: 'xxxx',
filename: 'xxxx',
filesize: 222,
content: '<base64 encoded binary data>'
}
Second approach: multipart form post:
POST: /api/projects/234/reports
{
author: 'xxxx',
abstract: 'xxxx',
}
as a response I'll get a report id, and with that I shall issue another post
POST: /api/projects/234/reports/1/content
enctype=multipart/form-data
and then just send the binary data
(have a look at this: https://stackoverflow.com/a/3938816/47633)
Third approach: post the binary data to a separate resource and save the href
first I generate a random key at the client and post the binary content there
POST: /api/files/E4304205-29B7-48EE-A359-74250E19EFC4
enctype=multipart/form-data
and then
POST: /api/projects/234/reports
{
author: 'xxxx',
abstract: 'xxxx',
filename: 'xxxx',
filesize: 222,
href: '/api/files/E4304205-29B7-48EE-A359-74250E19EFC4'
}
(see this: https://stackoverflow.com/a/4032079/47633)
I just wanted to know if there's any other approach I could use, the pros/cons of each, and if there's any established way to deal with this kind of requirements
the big con I see to the first approach, is that I have to fully load and base64 encode the file on the client
some useful resources:
Post binary data to a RESTful application
What is a good way to transfer binary data to a HTTP REST API service?
How do I upload a file with metadata using a REST web service?
Bad idea to transfer large payload using web services?
https://stackoverflow.com/a/5528267/47633
My research results:
Single request (data included)
The request contains metadata. The data is a property of metadata and encoded (for example: Base64).
Pros:
transactional
everytime valid (no missing metadata or data)
Cons:
encoding makes the request very large
Examples:
Twitter
GitHub
Imgur
Single request (multipart)
The request contains one or more parts with metadata and data.
Content types:
multipart/form-data
multipart/mixed
multipart/related
Pros:
transactional
everytime valid (no missing metadata or data)
Cons:
content type negotiation is complex
content type for data is not visible in WADL
Examples:
Confluence (with parts for data and for metadata)
Jira (with one part for data, metadata only part headers for file name and mime type)
Bitbucket (with one part for data, no metadata)
Google Drive (with one part for metadata and one for part data)
Single request (metadata in HTTP header and URL)
The request body contains the data and the HTTP header and the URL contains the metadata.
Pros:
transactional
everytime valid (no missing metadata or data)
Cons:
no nested metadata possible
Examples:
S3 GetObject and PutObject
Two request
One request for metadata and one or more requests for data.
Pros:
scalability (for example: data request could go to repository server)
resumable (see for example Google Drive)
Cons:
not transactional
not everytime valid (before second request, one part is missing)
Examples:
Google Drive
YouTube
I can't think of any other approaches off the top of my head.
Of your 3 approaches, I've worked with method 3 the most. The biggest difference I see is between the first method and the other 2: Separating metadata and content into 2 resources
Pro: Scalability
while your solution involves posting to the same server, this can easily be changed to point the content upload to a separate server (i.e. Amazon S3)
In the first method, the same server that serves metadata to users will have a process blocked by a large upload.
Con: Orphaned Data/Added complexity
failed uploads (either metadata or content) will leave orphaned data in the server DB
Orphaned data can be cleaned up with a scheduled job, but this adds code complexity
Method II reduces the orphan possibilities, at the cost of longer client wait time as you're blocking on the response of the first POST
The first method seems the most straightforward to code. However, I'd only go with the first method if anticipate this service being used infrequently and you can set a reasonable limit on the user file uploads.
I believe the ultimate method is number 3 (separate resource) for the main reason that it allows maximizing the value I get from the HTTP standard, which matches how I think of REST APIs. For example, and assuming a well-grounded HTTP client is in the use, you get the following benefits:
Content compression: You optimize by allowing servers to respond with compressed result if clients indicate they support, your API is unchanged, existing clients continue to work, future clients can make use of it
Caching: If-Modified-Since, ETag, etc. Clients can advoid refetching the binary data altogether
Content type abstraction: For example, you require an uploaded image, it can be of types image/jpeg or image/png. The HTTP headers Accept and Content-type give us some elegant semantics for negotiating this between clients and servers without having to hardcode it all as part of our schema and/or API
On the other hand, I believe it's fair to conclude that this method is not the simplest if the binary data in question is not optional. In which case the Cons listed in Eric Hu's answer will come into play.

RESTful web services - best way to return result of an operation?

I am designing a RESTful API and I would like to know what the most RESTful way is to return details about an operation.
E.g. an operation on a resource occurs when some data is POSTed to a URL. HTTP status codes will indicate either success or failure for the operation. But apart from success/failure I need to indicate some other info to the client, such as an ID number.
So my question is, should the ID number be returned in an XML document in the response content, or should it be returned in some custom HTTP header fields? Which is more in line with the principles of REST? Or am I free to choose.
Returning an entity is a perfectly valid response to an HTTP POST.
You also do not need to return XML you could just use the content type text/plain and simply return a string value.
Using a header would require you to define a new custom header which is not ideal. I would expect clients would have an easier time parsing a response body than extracting the information from a header.
XML document makes the most sense.
If it is a just an ID number, it would save overhead to do it just as an HTTP header. Building a correct XML document just for a single number would add much more overhead to the request.

RESTful design - how to model entity's attachments

I am trying to model entity's attachments in REST. Let's say a defect entity can have multiple attachments attached to it. Every attachment has a description and some other properties (last modified, file size...) . The attachment itself is a file in any format (jpeg, doc ...)
I was wondering how should I model it RESTfully
I thought about the following two options:
First approach (using same resource, different representations):
GET , content-type:XML on http://my-app/defects/{id}/attachments will return the defect's
attachments metadata in XML format (description, last modified, file size...)
GET , content-type:gzip on http://my-app/defects/{id}/attachments will return the defect's attachments in a zip file
GET , content-type:mime multi-part on http://my-app/defects/{id}/attachments will return the defect's attachments in a multi-part message (binary data and XML metadata altogether)
POST, content-type:XML on http://my-app/defects/{id}/attachments will create new attachment, metadata only no file attached (then the user has to send PUT request with the binary data)
POST , content-type:mime\multi-part on http://my-app/defects/{id}/attachments will create the attachment, the client can send both metadata and file itself in a single roundtrip
Second approach (separate the attachment's data from the metadata):
GET , content-type:XML on http://my-app/defects/{id}/attachments will return the defect's
attachments metadata in XML format (description, last modified, file size...)
GET , content-type:gzip on http://my-app/defects/{id}/attachments/files will return the defect's attachments binary data in a single zip
Creating a new attachment, first call:
POST, content-type:XML on http://my-app/defects/{id}/attachments will create new attachment, metadata only no file attached (then the user has to send PUT request with the binary data)
Then add the binary data itself:
POST , content-type:mime\multi-part on http://my-app/defects/{id}/attachments/{id}/file will create the attachment file
On one hand the first approach is more robust and efficient since the client can create\get the attachments metadata and binary data in single round trip. On the other hand, I am a bit reluctant to use the mime-multipart representation as it's more cumbersome to consume and produce.
EDIT: I checked out flicker upload REST API. It seems they are using multi part messages to include both the photo and the photo attributes.
Much of this problem has already been solved by the Atom Pub spec. See here
One thing to be careful about in your proposed solutions is that you are using content negotiation to deliver different content. I believe that is considered bad. Content negotiation should only deliver different representations of the same content.
Don't manage metadata separately. A two-part action defeats the point of REST.
One smooth GET/POST/PUT/DELETE with one -- relatively -- complex payload is what's typically done.
The fact that it's multiple underlying "objects" in "tables" is irrelevant to REST.
At the REST level, it's just one complex object's state transmitted with one message.

Restful's principle

What is the real meaning of Resources with multiple representations for the restful? After reading InfoQ's "A Brief Introduction to REST", I am confused. What is Representations?
A representation is a certain way to display and/or transfer data. The same resource can be represented in different ways:
As HTML page
As an XML document
As a JSON data structure
As plain text
Even as a PDF file if that would be desired
...
You can exchange "representation" with "data format" to get a better understanding.
Examples for a "customer" resource:
HTML:
<h1>John Doe</h1>
XML:
<customer-name>John Doe</customer-name>
JSON:
{
"UserName" : "John Doe",
}
A metaphor:
Just think of a picture. It can be represended as Bitmap, PNG, JPEG and many other formats and data structures. All of them show the same picture but they differ in their internal structure. (their "representation")
Practical considerations:
In a web application environment the most common representation is (X)HTML as the standard output sent to the browser. Followed by XML and JSON when it comes to Ajax and automated access to the web application.
A Resource is basically a collection of data, in the example it is the associated data with a given customer.
When you retrieve a resource, you get a representation of it. Now for most data there are multiple representations available. Think of a table of data, or a chart, etc...
In the example you define which representation you would like to receive by setting the HTTP Accept header. In the first example in an xml format, in the second one in a vcard format.
Take a look at this: REST Wikipedia article
A resource is something on the server, a "thing", and the article is just saying you can have multiple message formates returned about that "thing" that describe it in different ways...
Have a look at Roy Fielding's dissertation which defines REST.
Actually "representation" is more abstract than these answers suggest. "Representation" simply means what you get back is not necessarily the entire resource. For example, I have an employee record which is a resource in my corporate HR database. "Employee" is an obvious resource noun to expose through a RESTful architecture. But if you access my employee ID through the e-mail URI, the representation will be entirely different than the representation you see when accessing my employee ID through the HR benefits URI.
What DR's answer describes (JSON, XML, etc.) are actually called media-types in REST terminology. It is simply the data format of the response.