Cache-control policy for search resource on RESTful API

Cache-control policy for search resource on RESTful API - rest

I'm creating a RESTful API (using MVC.NET) to allow external access to a business system. The API includes a search resource. The resource takes the URI form "/example/search/pages/1/?query=something".
Example: To search for pizza you would access the URI "/example/search/pages/1/?query=pizza" which would give you the first 10 results. To get the second page of results you would request "/example/search/pages/2/?query=something" etc.
I've used the cache-control HTTP header to enable public caching of all the resources on the API with the aim of dramatically reducing the load on the server(s) serving the API web app.
However I'm not sure what caching policy to use for the search resource. As the resource (and it's URI) vary depending on what you search for, there's seems little point caching the page. What caching policy (i.e. caching via the cache-control HTTP header) do people recommend for search resources on RESTful APIs? No caching? Private caching with a very short expiry time? Public caching with short expiry?

Most proxy will not cache anything that uses a querystring.
If you want caching, I'd suggest crafting new URIs for your search request using a POST-Redirect-GET pattern.
POST search
Content-Type: application/x-www-form-urlencoded
term=something
303 See Other
Location: /search/something/1
This will enable caching more agressively, but you'll have to craft those URIs and will still get hit by the initial POST. That said, if it's the query that's problematic, this will solve the problem nicely.

public caching with an appropriate max-age is what you want for this - the value of max-age will be application specific and is a subjective judgement call you have to make.
You have to balance the risk of serving stale responses against the reward of not having to compute every request.. If this risk is extremely high then shorten the time - but just be conscious that by doing this you are increasing load of your origin server. It's a good idea to monitor usage patterns and server loads in order to establish that your initial judgement is correct.
This wasn't part of your question but if I was you I would move the pagination into the query part of the URI, so
/example/search/pages/1/?query=something
would become:
/example/search?term=something&page=1
It's not essential but it will be more intuitive for developers, and you can hit it easier with an HTML form

Related

How to make sure request hits REST api from a fixed page?

I have a basic html, which contains a form. This form submission is handled by a RESTful backend api service (written in spring boot). The html page is unprotected for business reasons -any sort of authentication / login mechanism can't be applied on the HTML. How can I make sure, only the html is allowed to hit the backend APIs, and not other sources? Both the html and backend apis are under the same domain. Example - example.com/index.html; example.com/getStudentList

How can I make sure, only the html is allowed to hit the backend APIs, and not other sources?
If I'm understanding things correctly, you don't want consumers of your API to authenticate with the API, because reasons? But what you want is that any client that loads the index page can access the API.
The closest implementation I can think of that would work at all like that would be to treat the API urls like a one time pad: You dynamically generate the html page, with urls that include some difficult to guess token. When the API receives any request, it checks the token -- if there is no token, it rejects the request (403 - Forbidden). If there is a token, it checks whether or not that token is still active; if the token is expired, then the request is rejected. If the token is inactive, but within some grace period, you might redirect the API request to a URL with a newer token (301 - Moved Permanently). If the token is active, then you serve the request.
Mark Seemann, while trying to solve a different problem, wrote a nice little introduction: Avoiding Hackable URLs.
If that sounds to you like a session cookie -- well, you aren't far wrong. To be completely honest, I suspect that the differences are subtle, and I wouldn't be surprised to discover that I exaggerate them. The primary differences are that we are communicating things like cache invalidation and the resource lifecycle explicitly to intermediary components. The Cookie header, on the other hand, is effectively opaque.
This answer is certainly imperfect -- anybody who happens to guess the currently active URL is going to be able to access the API whether they hit the index page or not. Obscurity, rather than security.
But it might be enough to tide you over until you have reasonable requirements.

REST API GET with sensitive data

I'm designing api with method that should be an idempotent, and should not modify any data on the server. It should be method that process request and return response for given parameters.
One of the parameters is sensitive data. It's not an option to use additional encryption. Data is already encrypted, but security requirements are very demanding and even encrypted data should be treated very carefully.
According to REST spec, idempotent query method should be implemented as a GET HTTP method. Problem in this case is sensitive data that shouldn't be pass as a GET parameter in URL. Only option in HTTP standard is to pass sensitive data in a body part of HTTP request.
My question is what is better? Broke rest api design, and send query request as a POST, or pass encrypted data in URL? Maybe is there better solution I don't see?

According to REST spec, idempotent query method should be implemented
as a GET HTTP method.
2016
As far as I can tell with my limited English SHOULD != MUST. You won't break REST API design by sending a POST in this case. You can send your sensitive data in a HTTP header if that is possible. And ofc. you should use HTTPS if you want to send sensitive data to anywhere.
2019
I checked the HTTP 1.1 standard meanwhile. They don't explicitly use the MUST or SHOULD words in the specs for idempotency, but I got the impression they mean SHOULD. Another HTTP related thing here, that we use GET mostly because we can cache response with it. You don't necessarily want to cache sensitive data, so it might not make sense to insist on using GET on retrieval when security is more important by the parameters. You can find some tips about how to set cache-control headers here, but you can read the HTTP standard for that too.
From security perspective my non-expert opinion is the following:
Normally query parameters are not that sensitive, usually they are just random ids or keywords. So maybe the problem is with your design and you should hide these sensitive parameters (e.g. social security number) behind random ids instead of querying them explicitly. Another thought here, that user credentials must be in the Authorization header for example, not in the query string, so if the sensitive data is that kind, then you are doing it wrong.
As far as I understand the issue about sending sensitive data in URLs is that it can show up in browser history, cache, address bar and in server logs unencrypted. Even though many people call REST webservices directly from browser via AJAX (or the fetch API), that is not the intended way they should be used. Webservices are mostly for server side usage to scale out your application to multiple threads, cores or servers. So if you use a server side HTTP client which does not have history or cache to call the REST webservice programmatically, then all you need to do is encrypting your logs. If the client has cache, then you can encrypt that too if you feel it necessary. I think it is possible to filter these params from logs and store the cached content based on the salted hash of the URL, but I don't have much experience with that.
If you have a 3rd party client or a browser where you don't have that kind of control, then you can still assume that it follows the HTTP standard. So you can use the cache-control headers to disable cache for sensitive content. The address bar and history is not a problem by single page applications unless they move the sensitive data to there with the history API, but that can happen no matter what you do. It is possible to disable the Referrer header too. Only if you serve HTML with your webservice will you have a problem with browsers, because that assumes that javascript is disabled (so you cannot use location.replace to override browser history along with the sensitive querystring) and that the browser is your REST client. I think that is a very unlikely scenario, though it is possible to do it relative well with XML+XSL reusing most of the code or nowadays maybe with nodejs or some sort of transpiler on different languages.
So I think this can be solved even without POST if you do everything right. But this is just an opinion, I wait for security expert to correct me...

Can one cache and secure a REST API with Cloudflare?

I am designing a RESTful API that is intended to be consumed by a single-page application and a native mobile app. Some calls of this API return public results that can be cached for a certain time. Moreover, there is a need for rate protection to protect the API against unauthorized users (spiders)
Can I use Cloudflare to implement caching and rate-limiting / DDOS protection for my RESTful API?
Caching: Cloudflare supports HTTP cache control headers so the API can decide for each entity requested via GET whether is public and how long it can be cached.
However it is not clear whether the cache control header is also passed downstream to client, so will also trigger the browser to cache the response? This may not be desirable, as it could make troubleshooting more difficult
Akamai has an Edge-Control header to ensure content is cached in CDN but not the browser. Can one do something similar with Cloudflare?
DDOS Protection: Cloudflare support has an article recommending that DDOS protection be disabled for backend APIs, but this does not apply to my use case where each client is supposed to make few requests to the API. The native DDOS protection actually fits my requirements for protecting the API against bots.
I need to know how I can programatically detect when Cloudflare serves a Captcha / I'm under attack etc. page This would then allow the SPA / mobile app to react intelligently, and redirect the user to a web view where she can demonstrate her "hummanness".
From Cloudflare documentation, it is not obvious what HTTP status code is sent when a DDOS challenge is presented. An open-source cloudscraper to bypass Cloudflare DDOS protection seems to indicate that Captcha and challenge pages are delivered with HTTP status 200. Is there a better way than parsing the request body to find out whether DDOS protection kicked in?
Cloudflare apparently uses cookies to record who solved the Captcha successfully. This obviously creates some extra complexity with native apps. Is there a good way to transfer the Cloudflare session cookies back to a native app after the challenge has been solved?
Probably this is something of an advanced Cloudflare use case - but I think it's promising and would be happy to hear if anyone has experience with something like this (on Cloudflare or another CDN).

Cloudflare has published a list of best practices for using it with APIs.
TL;DR, they recommend setting a page rule that patches all API requests and putting the following settings on it:
Cache Level: Bypass
Always Online: OFF
Web Application Firewall: OFF
Security Level: Anything but "I'm under attack"
Browser Integrity Check: OFF

Yes CloudFlare can help with DDOS protections and No it does not implement caching and rate-limiting for your API. You are to implement those your self or you use a framework that does.
You can use CloudFlare to protect your API endpoint by using it as a proxy.
CloudFlare protects the entire URL bit your can use the page rules to tweak the settings to your api endpoint.
Example: https://api.example.com/*
Reduce the the security for this rule to between low or medium so as
not to show a captcha.
API's are not meant to show captcha you protect them with authorizations and access codes.
you can implement HTTP Strict Transport Security and Access-Control Headers on your headers.
Cloud Hosting providers (e.g DigitalOcean, Vultr,etc..) have free or paid DDoS protection. You can subscribe for it on just that public facing VM. This will be a big plus because now you have double DDOS protection.
For cache APIs
Create a page rule like https://api.example.com/*.json
Set the Caching Level for that rule such that CloudFlare caches it on its servers for a specific duration.
The are so many other ways you can protect APIs. Hopes this answer has been of help?

This is a 5 year-old question from #flexresponsive with the most recent answer having been written 3 years ago and commented upon 2 years ago. While I'm sure the OP has by now found a solution, be it within CloudFlare or elsewhere, I will update the solutions given in a contemporary (2020) fashion and staying within CloudFlare. Detailed Page Rules are always a good idea for anyone; however for the OP's specific needs, this specific set in combination with a "CloudFlare Workers" script will be of benefit:
Edge Cache TTL: (n)time set to the time necessary for CloudFlare to cache your API content along/in its "Edge" (routes from edge node/server farm location is dependent upon one's account plan, with "Free" being of lowest priority and thus more likely to serve content from a location with higher a latency from it to your consumers.
However Edge Cache TTL > 0 (basically using it at all) this will not allow setting the following, which may or not be of importance to your API:
Cache Deception Armor: ON
Origin Cache Control: ON if #3 is being used and you want to do the following :
Use Cache Level: Cache Everything in combination with a worker that runs during calls to your API. Staying on-topic, I'll show two headers to use specific to your API 's route / address.
addEventListener("fetch", event => {
event.respondWith(fetchAndReplace(event.request));
});
async function fetchAndReplace(request) {
const response = await fetch(request);
let type = response.headers.get("Content-Type") || "";
if (!type.startsWith("application/*")) {
return response;
}
let newHeaders = new Headers(response.headers);
'Cache-Control', 's-maxage=86400';
'Clear-Site-Data', '"cache"';
return new Response(response.body, {
status: response.status,
statusText: response.statusText,
headers: newHeaders
});
}
In setting the two cache-specific headers, you are saying "only shared proxies can cache this". It's impossible to fully control how any shared proxy actually behave, though, so depending on the API payload, the no-transform value may be of value if that's a concern, e.g. if only JSON is in play, then you'd be fine without it unless a misbehaving cache decides to mangle it along the way, but if say, you'll be serving anything requiring an integrity hash or a nonce then using the no-transform is a must to ensure that the payload isn't altered at all and in being altered cannot be verified as the file coming from your API. The Clear-Site-Data header with the Cache value set instructs the consumer's browser to essentially clean the cache as it receives the payload. "Cache" needs to be within double-quotes in the HTTP header for it to function.
Insofar as running checks to ensure that your consumers aren't experiencing a blocking situation where the API payload cannot be transmitted directly to them and a hCaptcha kicks in, inspecting the final destinations for a query string containing a cf string (I don't recall the exact layout but it would definitely have the CloudFlare cf in it and definitely not be where you want your consumers landing. Beyond that, the "normal" DDoS protection that CloudFlare uses would not be triggered by normal interaction with the API. I'd also recommend not following CloudFlare's specific advice to use a security level of anything but "I'm Under Attack"; on that point I must point out that even though the 5-second redirect won't occur on each request, hCaptchas will be triggered on security levels Low, Medium & High. Setting the security level to "Essentially Off" does not mean a security level of null; additionally the WAF will catch standard violations and that of course may be adjusted according to what is being served from your API.
Hopefully this is of use, if not to the OP at least to other would-be visitors.

GET vs POST in REST Web Service

I'm in the process of developing a REST service that allows a user to claim their listing based on a couple of pieces of information that appear on their invoice (invoice number and billing zip).
I've read countless articles and Stack Overflow questions about when to use GET and when to use POST. Overall, the common consensus is that GET should be used for idempotent operations and POST should be used for operations that create something on the server side. However, this article:
http://blog.teamtreehouse.com/the-definitive-guide-to-get-vs-post
has caused me to question using GET for this particular scenario, simply because of the fact that I'm using these 2 pieces of information as a mechanism to validate the identity of the user. I'm not updating anything on the server using this particular method call, but I also don't necessarily want to expose the information in the URL.
This is an internal web service and only the front-end that calls the service is publicly exposed, so I don't have to worry about the URL showing up in a user's browser history. My only concern would be the unlikely event that someone gain server log access, in which case, I'd have bigger problems.
I'm leaning toward POST for security reasons; however, GET feels like the correct method due to the fact that the request is idempotent. What is the recommended method in this case?

Independently of POST vs GET, I would recommend NOT basing your security as something as simple as a zip code and an invoice number. I would bet on the fact that invoice numbers are sequential (or close), and there aren't that many zip codes around - voila, I got full access to your listings.
If you're using another authentication method (typically in HTTP header), then you're good - it doesn't matter if you have an invoice number if the URL, so might as well use GET.
If you're not, then I guess POST isn't as bad as GET in term of exposing confidential content.

There isn't really any added security in a POST vs a GET. Sure, the request isn't in the URL, but it's REST we are talking about here, and the URL wouldn't be seen by a human anyway.

You question starts with some bad presumptions. Firstly, GET is not just for any old idempotent operation, it is for GETting resources from the server; it just happens that doing so should be side effect free. Secondly, the URL is not the only way for a GET request to send data to the server, you can use a payload with a GET request (at least as far as HTTP is concerned, some implementations are bad and don't support this or make it hard). Third, as pointed out, you have chosen some terrible data fields to secure your access. Finally, you are using a plain text protocol any way, so what neither method really offers and better security.
You should use the the verb that best describes what you are doing, you are getting some information from the server, so use GET. Use some proper security, such as basic HTTPS encryption. If you want to avoid these fields 'clogging' up the URL, you can send data in the payload of the request, something like:
GET /listings HTTP/1.1
Content-Type = application/json
{ "zip" : "IN0N0USZ1PC0D35",
"invoice" : "54859081145" }

RESTful HATEOAS Client Url

I'm reasonably sure I understand the server-side of HATEOAS design - returning state URL's in the response - but I'm slightly confused about how to design a client to accept these.
For instance, we access a resource at //somehost.com/resource/1 - this provides us with the resource data and links. We'll assume POST to //somehost.com/resource is returned, indicating a 'new' action. Now I understand posting some data to that url creates a new resource, and provides a response, but where does the form to post that data reside? I've seen implementations where //somehost.com/resource/1/new provides a form which POSTS to /resource, but that URL itself contains a verb, and seems to violate REST.
I think my confusion lies in that I'm implementing a RESTful API and a client to consume it, within the same application.
Is there some sort of best-practice for this sort of thing?

I've seen implementations where //somehost.com/resource/1/new provides a form which POSTS to /resource, but that URL itself contains a verb, and seems to violate REST.
This is incorrect. A URI containing a verb does not, in itself, violate any REST constraint. It is only when that URI represents an action that this becomes a violation. If you can perform a GET request on the URL and receive some meaningful resource (such as a "create new resource" form), then that is perfectly RESTful, and good practice.
My own API is exactly as you describe: /{collection}/new returns a form. /new is just shorthand for a hypothetical /new-resource-creation-form and still represents a noun, and only supports GET requests (HEAD, OPTIONS and TRACE not withstanding).
What HATEOAS prohibits is the user agent being required to know, that in order to create a new resource, it must add /new to the name of the collection.
Basically, if you implement your API as (X)HTML, and can surf it in a browser and perform all actions (AJAX may be required for non-POST form submissions until HTML and browsers catch up with HTTP), then it complies with the hypermedia constraint of REST.
EDIT promoted from comments:
As long as the response negates any need for a priori knowledge, it conforms to the hypermedia constraint. If the client claims to understand HTML, and you send back a response containing a link to an external stylesheet or javascript (no matter where that is hosted) which the client needs to be able to render the page correctly, then it is reasonable to say that the constraint is met. The client should know how to handle all media types it claims to support. A normal human web browser is the perfect example of a client with no out-of-band knowledge about any one HTTP service (web site).
Just to say it explicitly, a web site is a kind of HTTP service. Web browsers do not treat different web sites differently. In order to search for products on Amazon, you load the Amazon service endpoint at http://amazon.com/ and follow links or fill out forms provided in that response. In order to search for products on eBay, you load the eBay service endpoint at http://ebay.com/ and do the same.
Browsers don't know in advance that for searching eBay you must do this, but for searching Amazon you have to do that. Browsers are ignorant. Clients for other HTTP services should be ignorant too.

Yes, you could provide a URI that returns a form for resource creation. Conceivably the form could be used for dynamic discovery of the elements needed to construct a new resource (but you'd want to decide how practical that would really be in a machine-to-machine environment).
Unless there is a requirement that somehow the API has an exact browser-surfable equivalent, the documentation of the media type will describe what elements are needed.
Remember that documentation of media types and the allowed HTTP verbs for a resource is not contrary to RESTful principles. Look at the SunCloud API for an example.
Indeed, according to your example, POST'ing to
//somehost.com/resource
to create a new resource is more standard than first returning a form
//somehost.com/resource/1/new
and THEN POST'ing to
//somehost.com/resource
anyway.