How can I implement a RESTful Progress Indicator? - rest

I want my API to be be RESTful
Let say I have started a long running Task with POST and now want to be informed about the progress?
What is the idiomatic REST way to do this?
Poll with GET every 10 seconds?

The response to your
POST /new/long/running/task
should include a Location header. That header will point to an endpoint the client can hit to find out the status of the task. I would suggest your response look something like:
Location: http://my.server/task-status/15
{
"self": "/task-status/15",
"status": "running",
"expectedFinishedAt": <timestamp>
}
Then your client doesn't have to ping arbitrarily, because the server is giving it a hint as to when to check back. Later GETs to /task-status/15 will return updated timestamps. This saves you from having to blindly poll the server. Of course, this works much better if the server has some idea as to how long it will take to finish processing the task.

The way REST works, or rather the mechanism it uses - the HTTPS GET/POST/PUT/DELETE etc. doesn't provide a mechanism to have an event-driven mechanism where the server could send the data to the client. Though, it is theoretically be possible to have client/server functionality in both your server and in your client - though I wouldn't personally endorse this design. So having some sort of a submit API - POST/PUT and then a status query mechanism - GET would do the job.

The client should be the one giving you that information, showing you how many bytes have been sent already to the server. The server should not care about a partially uploaded resource.
That put aside, you will return a "Location" header indicating where is the resource once is created, but not earlier. I mean, when you POST you donĀ“t know which is going to be the address of the resource (that is indicated later in the Location header), so there is no reasonable way of providing an URL to check the status of the upload, because there is no reasonable way of identifying it till is done (you may try crazy things, but it is not recommendable).
Again, the client should give you that feedback, not the server.

Related

REST API - "GET /user" changes user in database

We have a simple User API including "GET /user" to request user information. When processing the request we store the current datetime as "lastVisit" in our database. As a result we have a GET request updating the user in our database, which seems to be bad practice.
As we don't handle the login process on ourselves, GET /user is the first request to our backend. We cannot use /login to retrieve and store "lastVisit".
Is it bad practice? How to solve the issue?
There's nothing wrong with updating your database when you receive a GET request - the uniform interface of HTTP constrains what the GET method token means, but you have a lot of freedom in how your server implements the handling of that request.
So that much is fine.
"lastVisit", however, may be a problem - which is to say, your interpretation of what it means that somebody asked for a copy of the page ignores various edge cases: a web spider following links to index the documents (think Google), or a smart browser that is trying to reduce latency by downloading a link before the user clicks on it.
You don't know, from the request, whether the fetch was triggered by the client, or by the general purpose agent acting in the client's stead. Similarly, you don't know about any requests for the resource that were intercepted and handled by a cache that had a valid copy of the resource.
It may be that using request handling time as a proxy for last visit is a good enough cost effective approximation of what you want to get by, but you should keep in mind that it is an estimate, not a truth.

How to send a POST request without any data, to check if that endpoint is up?

I'm in the process of writing a testing framework for an application, and I am not allowed to update, delete, move, or basically do anything with the data used by this application. For GET requests I need to test this is no problem, but PUT, POST and DELETE methods that change data this obviously is not the case.
Is there any way to send a POST request without any body, and still get a response that shows the url can take a request? Or in other words, how can I show that a url that is a POST is up and able to take requests, without actually sending the POST request and changing something in the database? (unfortunately its not possible to add a test object to database and run requests on that).
I need to do this programmatically in either Java or C# as well.
There is no general way to 'test' if a POST request will work.
Most servers will likely emit a 400 error for these endpoints, which doesn't tell you a lot.
The most standard way to see if something is able to accept a POST request at all, is probably by doing an OPTIONS request and using the Allow header in the response to get the list of supported methods.
There is no guarantee that this is going to be correct, but many modern frameworks do a decent job populating this list. This is likely going to give you the most accurate, but still imperfect results.
You should not send an empty POST request anywhere because it could have a meaning and you could make unexpected changes to a server. For this kind of introspection stuff, stick to the 'safe' methods.

REST API status when external APIs are down - Best Practices

I'm looking for guidance on good practices when it comes to returning errors from a REST API. I'm working on a new API so I can take it any direction.
In my case client invokes my API which internally invokes some external APIs. In case of success no problem, but in case of error responses from the far end(external cloud APIs) I am not sure what is industry standard for such services. Am currently thinking of returning 200 OK and then a json payload which details about the external API errors.
So what is the industry recommendations? Good practices (please explain why!) and also, from a client pov, what kind of error handling in the REST API makes life easier for the client code?
The failure you're asking about is one that has occurred within the internals of the service itself, though it is having external dependencies, so a 5XX status code range is the correct choice. 503 Service Unavailable looks perfect for the situation you've described.
5XX codes used for telling the client that even though the request was fine, the server has had some kind of problem fulfilling the request. On the other hand,
4XX codes are used to tell the client that it has done something wrong in request (and that the server is just fine, thanks).
Sections 10.4 and 10.5 of the HTTP 1.1 spec explain the different purposes of 4XX and 5XX codes.
Our colleagues have already provided the links / explanations about the HTTP status codes so you should learn them and find the most appropriate in your case.
I'll more concentrate on what can influence your decisions, assuming you've learnt the status codes.
Basically, You should understand what are the business implications of the flow triggered by client when he/she calls "your" API. The client doesn't know anything about the external cloud API you're working with and doesn't really care whether it works or not, the client works with your application.
If so, when the remote system returns some kind of error (and yes, different error statuses should give you a clue of what's wrong with the remote system), its your business decision about how to handle this error, and depending on this decision you might want to "behave" differently in the interaction with a client.
Here are some examples:
You know that the remote system breaks extremely rarely. But once its unavailable, you system doesn't work as well.
In this case you can might consider to retry the call to remote system if it failed. And if you still out of luck - then return some error status. Probably something like 5XX
You know that the data provided by remote client is not really important, on the other hand when the client calls your API its better to provide "something" even if its not really up-to-date than nothing. Think about the remote system that provides the "recommended movies" by some client id. And you're building a portal (netflix style). If this recommended movies service is down for some reason, it doesn't make sense to fail the whole portal page (think about the awful user experience). In this case you might want to "pre-cache" some generic list of movies, and use it as a fallback in case of failure of that remote service. In this case obviously you should return 2XX status in any case.
More advanced architecture. You know that the remote service fails often, and you can continue to work when its down. In this case maybe you will want to choose an "asynchronous" style of interaction with the client. For example: the client calls your rest and you respond immediately with an "Accepted" status code (202). You can save this id with status in some Database so that when the user "asks for status of the ticket by ticket id" you'll be able to query the DB. The point is that you return immediately. Then you might want to send the message with the task to some messaging system and once the consumer will pick the message, it will be processed and the db will be updated. As long as the remote service fails the message will get back to queue still being "unprocessed" (usually messaging systems can implement this behavior). Now at some point in time, the remote system starts responding, and all the messages get processed. Now their status in DB is "done".
So its up to client to ask "what happens" /or you can implement some push model with web sockets or something (its not REST style communication anymore in this case). But the point is that at some point in time the client will receive "OK, we're done with the ticket ID" (status 200). In this case the client can call a special endpoint and consume the stored results that you'll store in the DB as well (again status 200)
Bottom line, as you see, HTTP return codes are just an indicator, but its up to you how to organize the process of interconnection with the client and the relevant HTTP statuses will be derived from your decisions.
I would use 503 - Service Unavailable - as the error. Reason -
This is considering the case that the API operation cannot be completed without response from the external API. This is similar to my DB not responding. So my API is unavailable for service till the external service is back online.
As an API client, I am not concerned whether the API server internally invokes other APIs or not. I am just concerned with the result of the API server. So it does not matter to the client whether I am a proxy or not - hence, I would avoid 502 (Bad Gateway) and 504 (Gateway Timeout). These error can put the client into wrong assumption that the Gateway between the client and our service is causing trouble.
As suggested by #developerjack, I would also recommend to - "Include a Retry-After header so that your HTTP client knows not to spam you with retries until after X time. This means less error traffic for you, and better request planning for the client."
HTTP calls are between client and server, and so the error codes should reflect where the error or fault lies on either side of that relationship. Just because its downstream to you doesn't mean the HTTP client needs to care about that.
Given this, you should be returning a 5xx error because the fault is not with the client, its with the server (or its downstream services). Returning a 2xx (see below for caveat) would be incorrect because the HTTP call did not succeed, and a 4xx would be incorrect because it's not the client's fault.
Digging into specific 5xx's you can return:
A 504 or 502 might be appropriate if you specifically want to signal that your service is acting as a gateway/proxy.
A 523 is unofficial but used by cloudflare to specifically signal that an upstream/origin service is unreachable
A 500 (with a human and machine readable error body) is a safe default that simply indicates "there is something not right with the server and its services right now".
Now, in terms of best practice, there are some techniques you can use to either reduce the 500 errors, or make it easier on the clients to respond/react to this 5xx response.
Put in place retries within your service. If your service is working and the fault is downstream, and can successfully store the client's request to retry later when downstream services are available then you can still respond with a 2xx and the client knows that their request will be submitted. A great example of this might be a user sign up workflow; you can process the signup at your side, and then queue the welcome email to retry later if your email provider is unavailable.
Have both human descriptions, machine error codes and links in your API responses. Human descriptions are useful when debugging and developing against your service. Machine codes mean clients can index/track and code up specific code paths to a given scenario, and links to your docs mean you can alway provide more information. Even better is including any specific ID's for you to trace instances of this error in case the HTTP client needs to reach out for support (though this will be heavily dependant on your logging & telemetry). Here's an example:
{
"error_code": 1234,
"description": "X happened with Y because of Z.",
"learn_more": "https://dev.my.app/errors/1234",
"id": "90daa63b-f5ac-4c33-97d5-0801ade75a5d"
}
Include a Retry-After header so that your HTTP client knows not to spam you with retries until after X time. This means less error traffic for you, and better request planning for the client.

Long-running operations in web-application

Web application operations are generally meant to be quick to avoid long wait times to users. However, some operations the web application may perform may be computationally-intensive and take a fair bit of time. What is the best practice in REST to deal with such operations that may be take several minutes yet require an immediate response to users? Is it okay for the web application to take several minutes to return the response of the HTTP request, or is it better to return a 202 response, process in the background somewhere else, and then provide some form of notification to the user?
Is it okay for the web application to take several minutes to return the response of the HTTP request
No. Part of the problem with this approach is that if the server doesn't acknowledge the request in a timely fashion, the client won't know that it reached its intended destination.
is it better to return a 202 response, process in the background somewhere else, and then provide some form of notification to the user?
Yes. That's exactly what 202 Accepted is designed for
The 202 response is intentionally noncommittal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent's connection to the server persist until the process is completed. The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
It can help, I think, to remember that we're talking about your integration domain; the client isn't talking to your app. It's instead talking to your API, which pretends to be a web site that the client can integrate with. So your client sends the request to the API, and the API responds with an accepted message accompanied by a bunch of links that will help the client continue with the protocol and eventually reach its goal.

GET vs POST in REST Web Service

I'm in the process of developing a REST service that allows a user to claim their listing based on a couple of pieces of information that appear on their invoice (invoice number and billing zip).
I've read countless articles and Stack Overflow questions about when to use GET and when to use POST. Overall, the common consensus is that GET should be used for idempotent operations and POST should be used for operations that create something on the server side. However, this article:
http://blog.teamtreehouse.com/the-definitive-guide-to-get-vs-post
has caused me to question using GET for this particular scenario, simply because of the fact that I'm using these 2 pieces of information as a mechanism to validate the identity of the user. I'm not updating anything on the server using this particular method call, but I also don't necessarily want to expose the information in the URL.
This is an internal web service and only the front-end that calls the service is publicly exposed, so I don't have to worry about the URL showing up in a user's browser history. My only concern would be the unlikely event that someone gain server log access, in which case, I'd have bigger problems.
I'm leaning toward POST for security reasons; however, GET feels like the correct method due to the fact that the request is idempotent. What is the recommended method in this case?
Independently of POST vs GET, I would recommend NOT basing your security as something as simple as a zip code and an invoice number. I would bet on the fact that invoice numbers are sequential (or close), and there aren't that many zip codes around - voila, I got full access to your listings.
If you're using another authentication method (typically in HTTP header), then you're good - it doesn't matter if you have an invoice number if the URL, so might as well use GET.
If you're not, then I guess POST isn't as bad as GET in term of exposing confidential content.
There isn't really any added security in a POST vs a GET. Sure, the request isn't in the URL, but it's REST we are talking about here, and the URL wouldn't be seen by a human anyway.
You question starts with some bad presumptions. Firstly, GET is not just for any old idempotent operation, it is for GETting resources from the server; it just happens that doing so should be side effect free. Secondly, the URL is not the only way for a GET request to send data to the server, you can use a payload with a GET request (at least as far as HTTP is concerned, some implementations are bad and don't support this or make it hard). Third, as pointed out, you have chosen some terrible data fields to secure your access. Finally, you are using a plain text protocol any way, so what neither method really offers and better security.
You should use the the verb that best describes what you are doing, you are getting some information from the server, so use GET. Use some proper security, such as basic HTTPS encryption. If you want to avoid these fields 'clogging' up the URL, you can send data in the payload of the request, something like:
GET /listings HTTP/1.1
Content-Type = application/json
{ "zip" : "IN0N0USZ1PC0D35",
"invoice" : "54859081145" }