Best practice for file-based search in rest service - rest

I'm helping build a similarity search service for files. One way to search for something is with a GET request, by giving a file's URL, but I also need to allow clients to send the file directly. I have to following options:
Make the client send a GET request with a Payload; it seems this is not recommended -- HTTP GET with request body
Use something else than GET (maybe a PUT?) for file-based search. The problem is none of the other HTTP methods seems to suit this purpose.
What option would suit best here? I'm not an expert in this field, and I can't figure out what's the right thing to do in this situation.

Here is the rule I have always followed with REST.
GET - only querying data and returning a data set.
POST - Creating data in the database
PUT - Modifying data
DELETE - Destroy data in the database.
If you are sending a payload for search params, you can do a GET and put those params (assuming they are name/value pairs) in the query string of the URI.
i.e. http://my.simsearch.com?param1=first&param2=second ...
If you are actually going to change the database then a POST or PUT is in order.
I hope this helps.

I ended up sending the payload with a GET request. Even though it's not really recommended, hopefully no libraries will complain about this.

Related

Should I use GET or POST REST API Method?

I want to retrieve data about a bunch of resources. Let's say an Array of book id and the response is JSON Array of book objects. I want to send the request payload as JSON to the server.
Should I use GET and POST method?
Note:
I don't want to make multiple GET request for each book ID.
POST seems to be confusing as it is supposed to be used only when the request creates a resource or modifies the server state.
I want to retrieve data about a bunch of resources. Let's say an Array of book id and the response is JSON Array of book objects.
If you are thinking about passing the array of book id as the message body of the HTTP Request, then GET is a bad idea.
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
You should use POST instead
POST seems to be confusing as it is supposed to be used only when the request creates a resource or modifies the server state.
That's not quite right. POST can be used for anything -- see GraphQL or SOAP. But what you give up by using POST is the ability of intermediate components to participate in the conversation.
For example, for cases that are effectively read-only, you would like to use a safe method, because that allows pre-caching optimization, and automated retry of lost responses on an unreliable network. POST doesn't have extra semantic constraints, so you lose out.
What HTTP really wants is that you GET using the URI; this can be done in one of two relatively straightforward ways:
POST the ids to the server, to create a new resource (meaning that the server retains for itself a copy of the list of ids), and receive a new resource identifier back in exchange. Then GET using this new identifier any time you want to know the current representation of the results.
Encode the information you need into the URI itself. Most commonly, this is done using the query part of the URI, although that isn't strictly necessary. The downside here is that if the URI encoded representation of the array of ids is very long, you may have trouble with some implementations that enforce arbitrary URI limits.
There aren't always great answers:
The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction.
If I understand correctly, you want to get a list of all of the items in a list, in one pull. This would be possible using GET, as REST returns the JSON it can by default be up to 100 items, and you can get more items if needed by specifying $top.
As far as writing back or to the server, POST would be what your looking for, this to my understanding would need to be one for one.
you are going to use a GET-Request and put your request-data (book-id array) in the data-section of your ajax (or whatever you're going to use) request. See How to pass parameters in GET requests with jQuery

API: use HTTP headers to control business flow

I've just heard that using HTTP headers to control business flow is bad practice. Could someone give me an explanation why?
In my case, the default response contains only simple information but if you add X-Full-Info header there will be more information but request processing will take more time. Why is it bad practice? I was suggested to use query parameter.
One reason is caching. Assuming you are using GET, the result may be cached, and your http header will be ignored if the result is served from a cache. (Either a remote cache or your browser’s cache)
But if you use a query parameter to choose the results, the cache will know which result to return, and won’t return the wrong result.
You can try to solve this using the “Vary” header, but support for that header means more work on your part, and support for that header is not as widespread: https://blogs.msdn.microsoft.com/ieinternals/2009/06/17/vary-with-care/

Proper RESTful way to handle a request that is not really creating or getting something?

I am writing a little app that does one thing only: takes some user-provided data, does some analysis on it, and returns a "tag" for that data. I am thinking that the client should either GET or POST their request to /getTag in order to get a response back.
Nothing is stored on the server when the client does this, so it feels weird to use a POST. However, there is not a uniform URI for the analysis either, so using a GET feels weird, since it will return different things depending on what data is provided.
What is the best way to represent this functionality with REST?
The "best way" is to do whatever is most appropriate for your application and its needs. Not knowing that, here are a few ideas:
GET is the most appropriate verb since you're not creating or storing anything on the server, just retrieving something that the server provides.
Don't put the word get in the URI as you've suggested. Verbs like that are already provided by HTTP, so just use /tag and GET it instead.
You should use a well-understood (or "cool") URI for this resource and pass the data as query parameters. I wouldn't worry about it feeling weird (see this question's answers to find out why).
To sum up, just GET on /tag?foo=bar&beef=dead, and you're done.
POST can represent performing an action. The action doesn't have to be a database action.
What you have really created is a Remote Procedure. RPC is usually all POST. I don't think this is a good fit for REST, but that doesn't have to stop you from using simple URLs and JSON.
It seems to me like there would probably be a reason you or the user who generated the original data would want the generated tag to persist, wouldn't they?
If that's a possibility, then I'd write it as POST /tags and pass the /tags/:id resource URI back as a Location: header.
If I really didn't care about persisting the generated tag, I'd think about what the "user-generated data" was and how much processing is happening behind the scenes. If the "tag" is different enough from whatever data is being passed into the system, GET /tag might be really confusing for an API consumer.
I'll second Brian's answer: use a GET. If the same input parameters return the same output, and you're not really creating anything, it's an idempotent action and thus perfectly suited for a GET.
You can use GET and POST either:
GET /tag?data="..." -> 200, tag
The GET method means retrieve whatever information (in the form of an
entity) is identified by the Request-URI. If the Request-URI refers to
a data-producing process, it is the produced data which shall be
returned as the entity in the response and not the source text of the
process, unless that text happens to be the output of the process.
POST /tag {data: "..."} -> 200, tag
The action performed by the POST method might not result in a resource
that can be identified by a URI. In this case, either 200 (OK) or 204
(No Content) is the appropriate response status, depending on whether
or not the response includes an entity that describes the result.
according to the HTTP standard / method definitions section.
I would use GET if I were you (and POST only if you want to send files).

REST API - Opinion needed on architecture

I'm designing a REST API. The part I'm working on now involves simply reading objects in the form of JSON data off of the server and modifying them.
The resource that I am thinking of using for this looks like:
/data/{table name}/{row key}
I would like to allow GET and PUT operations on this resource.
The question that I am wrestling with is that I would like to return other data along with the JSON object such as customer messages, the amount of time it took for the round trip to the data base, etc... I would also like to allow for sending query arguments with the payload in cases where the URL would be too long if they were included there.
So the resources would work like this:
GET
GET /data/{table name}/{row key}
Server returns:
{
data:{...JSON OBJECT GOES HERE ....},
message:"customer messages go here",
responseTime:'123ms',
otherInfo:"Yada yada yada;
}
PUT
PUT GET /data/{table name}/{row key}
Client sends as payload:
{
data:{...JSON object goes here...},
queryArguments:{...extra long query arguments go here...}
}
I'm afraid this might violate the rules for proper RESTful GET and PUT resources because what you are sending to the server is not exactly what you are getting back out since other information is being included in the payloads. I'd rather not have every operation be a POST as a remedy.
Am I being too much of a stickler with this? Is there some other way I should be structuring this?
Thanks!
Edit::::
I should note that in the resource: /data/{table name}/{row key}, I used 'table name' and 'row key' for simplicity. This is for use with a noSQL database. This resource is intended to work similar to Amazon S3. "uuid" would actually be a better description than 'row key'.
As for me it just depends on how additional info is going to be used. For my customers responseTime is not a question (or at least I think so :), they just need that response. For me as developer it can help debugging. So when customer gives me slow request I can test it easy, and that extra information could help. Here I mean that it is possible to create simple url as you specified /data/{table name}/{row key} and send just response according to that request, and you can make one more url /data/{table name}/{row key}/debug or whatever else to get the same data with additional info like "reponseTime". Just an idea ;)
UPDATE: Ah yes, forgot: do not use table-name as part of your url, at least modify its name. I don't like to tell anybody how my tables are called, if somebody is going to hack my DB injecting extra code I would like her to spend more time looking for any information, instead of giving her info on a plate :)
I see nothing wrong with your approach.
But if I was implementing this scenario, I would have asked myself the following questions:
Who is my resource?
Is customer message part of the resource?
Is "response time" is part of the resource?
Let's take as an example "response time". If it's part of your resource, your approach is perfect and nothing else should be done.
However, if it's not part of the resource, return it as a HTTP header. Fair enough.
I don't see anything wrong with this, looks pretty standard to me. I'm not sure what you are planning to pass in queryArguments, is this where you would specify a callback to execute for JSON-P clients? Only thing I'd recommend you keep in mind is that REST deals with resources, and that does not necessarily map 1-to-1 with tables. Instead of using a row key you might want to have some type of GUID or UUID that you can map to that resource.

RESTful way to create multiple items in one request

I am working on a small client server program to collect orders. I want to do this in a "REST(ful) way".
What I want to do is:
Collect all orderlines (product and quantity) and send the complete order to the server
At the moment I see two options to do this:
Send each orderline to the server: POST qty and product_id
I actually don't want to do this because I want to limit the number of requests to the server so option 2:
Collect all the orderlines and send them to the server at once.
How should I implement option 2? a couple of ideas I have is:
Wrap all orderlines in a JSON object and send this to the server or use an array to post the orderlines.
Is it a good idea or good practice to implement option 2, and if so how should I do it.
What is good practice?
I believe that another correct way to approach this would be to create another resource that represents your collection of resources.
Example, imagine that we have an endpoint like /api/sheep/{id} and we can POST to /api/sheep to create a sheep resource.
Now, if we want to support bulk creation, we should consider a new flock resource at /api/flock (or /api/<your-resource>-collection if you lack a better meaningful name). Remember that resources don't need to map to your database or app models. This is a common misconception.
Resources are a higher level representation, unrelated with your data. Operating on a resource can have significant side effects, like firing an alert to a user, updating other related data, initiating a long lived process, etc. For example, we could map a file system or even the unix ps command as a REST API.
I think it is safe to assume that operating a resource may also mean to create several other entities as a side effect.
Although bulk operations (e.g. batch create) are essential in many systems, they are not formally addressed by the RESTful architecture style.
I found that POSTing a collection as you suggested basically works, but problems arise when you need to report failures in response to such a request. Such problems are worse when multiple failures occur for different causes or when the server doesn't support transactions.
My suggestion to you is that if there is no performance problem, for example when the service provider is on the LAN (not WAN) or the data is relatively small, it's worth it to send 100 POST requests to the server. Keep it simple, start with separate requests and if you have a performance problem try to optimize.
Facebook explains how to do this: https://developers.facebook.com/docs/graph-api/making-multiple-requests
Simple batched requests
The batch API takes in an array of logical HTTP requests represented
as JSON arrays - each request has a method (corresponding to HTTP
method GET/PUT/POST/DELETE etc.), a relative_url (the portion of the
URL after graph.facebook.com), optional headers array (corresponding
to HTTP headers) and an optional body (for POST and PUT requests). The
Batch API returns an array of logical HTTP responses represented as
JSON arrays - each response has a status code, an optional headers
array and an optional body (which is a JSON encoded string).
Your idea seems valid to me. The implementation is a matter of your preference. You can use JSON or just parameters for this ("order_lines[]" array) and do
POST /orders
Since you are going to create more resources at once in a single action (order and its lines) it's vital to validate each and every of them and save them only if all of them pass validation, ie. you should do it in a transaction.
I've actually been wrestling with this lately, and here's what I'm working towards.
If a POST that adds multiple resources succeeds, return a 200 OK (I was considering a 201, but the user ultimately doesn't land on a resource that was created) along with a page that displays all resources that were added, either in read-only or editable fashion. For instance, a user is able to select and POST multiple images to a gallery using a form comprising only a single file input. If the POST request succeeds in its entirety the user is presented with a set of forms for each image resource representation created that allows them to specify more details about each (name, description, etc).
In the event that one or more resources fails to be created, the POST handler aborts all processing and appends each individual error message to an array. Then, a 419 Conflict is returned and the user is routed to a 419 Conflict error page that presents the contents of the error array, as well as a way back to the form that was submitted.
I guess it's better to send separate requests within single connection. Of course, your web-server should support it
You won't want to send the HTTP headers for 100 orderlines. You neither want to generate any more requests than necessary.
Send the whole order in one JSON object to the server, to: server/order or server/order/new.
Return something that points to: server/order/order_id
Also consider using CREATE PUT instead of POST