Can one cache and secure a REST API with Cloudflare? - rest

I am designing a RESTful API that is intended to be consumed by a single-page application and a native mobile app. Some calls of this API return public results that can be cached for a certain time. Moreover, there is a need for rate protection to protect the API against unauthorized users (spiders)
Can I use Cloudflare to implement caching and rate-limiting / DDOS protection for my RESTful API?
Caching: Cloudflare supports HTTP cache control headers so the API can decide for each entity requested via GET whether is public and how long it can be cached.
However it is not clear whether the cache control header is also passed downstream to client, so will also trigger the browser to cache the response? This may not be desirable, as it could make troubleshooting more difficult
Akamai has an Edge-Control header to ensure content is cached in CDN but not the browser. Can one do something similar with Cloudflare?
DDOS Protection: Cloudflare support has an article recommending that DDOS protection be disabled for backend APIs, but this does not apply to my use case where each client is supposed to make few requests to the API. The native DDOS protection actually fits my requirements for protecting the API against bots.
I need to know how I can programatically detect when Cloudflare serves a Captcha / I'm under attack etc. page This would then allow the SPA / mobile app to react intelligently, and redirect the user to a web view where she can demonstrate her "hummanness".
From Cloudflare documentation, it is not obvious what HTTP status code is sent when a DDOS challenge is presented. An open-source cloudscraper to bypass Cloudflare DDOS protection seems to indicate that Captcha and challenge pages are delivered with HTTP status 200. Is there a better way than parsing the request body to find out whether DDOS protection kicked in?
Cloudflare apparently uses cookies to record who solved the Captcha successfully. This obviously creates some extra complexity with native apps. Is there a good way to transfer the Cloudflare session cookies back to a native app after the challenge has been solved?
Probably this is something of an advanced Cloudflare use case - but I think it's promising and would be happy to hear if anyone has experience with something like this (on Cloudflare or another CDN).

Cloudflare has published a list of best practices for using it with APIs.
TL;DR, they recommend setting a page rule that patches all API requests and putting the following settings on it:
Cache Level: Bypass
Always Online: OFF
Web Application Firewall: OFF
Security Level: Anything but "I'm under attack"
Browser Integrity Check: OFF

Yes CloudFlare can help with DDOS protections and No it does not implement caching and rate-limiting for your API. You are to implement those your self or you use a framework that does.
You can use CloudFlare to protect your API endpoint by using it as a proxy.
CloudFlare protects the entire URL bit your can use the page rules to tweak the settings to your api endpoint.
Example: https://api.example.com/*
Reduce the the security for this rule to between low or medium so as
not to show a captcha.
API's are not meant to show captcha you protect them with authorizations and access codes.
you can implement HTTP Strict Transport Security and Access-Control Headers on your headers.
Cloud Hosting providers (e.g DigitalOcean, Vultr,etc..) have free or paid DDoS protection. You can subscribe for it on just that public facing VM. This will be a big plus because now you have double DDOS protection.
For cache APIs
Create a page rule like https://api.example.com/*.json
Set the Caching Level for that rule such that CloudFlare caches it on its servers for a specific duration.
The are so many other ways you can protect APIs. Hopes this answer has been of help?

This is a 5 year-old question from #flexresponsive with the most recent answer having been written 3 years ago and commented upon 2 years ago. While I'm sure the OP has by now found a solution, be it within CloudFlare or elsewhere, I will update the solutions given in a contemporary (2020) fashion and staying within CloudFlare. Detailed Page Rules are always a good idea for anyone; however for the OP's specific needs, this specific set in combination with a "CloudFlare Workers" script will be of benefit:
Edge Cache TTL: (n)time set to the time necessary for CloudFlare to cache your API content along/in its "Edge" (routes from edge node/server farm location is dependent upon one's account plan, with "Free" being of lowest priority and thus more likely to serve content from a location with higher a latency from it to your consumers.
However Edge Cache TTL > 0 (basically using it at all) this will not allow setting the following, which may or not be of importance to your API:
Cache Deception Armor: ON
Origin Cache Control: ON if #3 is being used and you want to do the following :
Use Cache Level: Cache Everything in combination with a worker that runs during calls to your API. Staying on-topic, I'll show two headers to use specific to your API 's route / address.
addEventListener("fetch", event => {
event.respondWith(fetchAndReplace(event.request));
});
async function fetchAndReplace(request) {
const response = await fetch(request);
let type = response.headers.get("Content-Type") || "";
if (!type.startsWith("application/*")) {
return response;
}
let newHeaders = new Headers(response.headers);
'Cache-Control', 's-maxage=86400';
'Clear-Site-Data', '"cache"';
return new Response(response.body, {
status: response.status,
statusText: response.statusText,
headers: newHeaders
});
}
In setting the two cache-specific headers, you are saying "only shared proxies can cache this". It's impossible to fully control how any shared proxy actually behave, though, so depending on the API payload, the no-transform value may be of value if that's a concern, e.g. if only JSON is in play, then you'd be fine without it unless a misbehaving cache decides to mangle it along the way, but if say, you'll be serving anything requiring an integrity hash or a nonce then using the no-transform is a must to ensure that the payload isn't altered at all and in being altered cannot be verified as the file coming from your API. The Clear-Site-Data header with the Cache value set instructs the consumer's browser to essentially clean the cache as it receives the payload. "Cache" needs to be within double-quotes in the HTTP header for it to function.
Insofar as running checks to ensure that your consumers aren't experiencing a blocking situation where the API payload cannot be transmitted directly to them and a hCaptcha kicks in, inspecting the final destinations for a query string containing a cf string (I don't recall the exact layout but it would definitely have the CloudFlare cf in it and definitely not be where you want your consumers landing. Beyond that, the "normal" DDoS protection that CloudFlare uses would not be triggered by normal interaction with the API. I'd also recommend not following CloudFlare's specific advice to use a security level of anything but "I'm Under Attack"; on that point I must point out that even though the 5-second redirect won't occur on each request, hCaptchas will be triggered on security levels Low, Medium & High. Setting the security level to "Essentially Off" does not mean a security level of null; additionally the WAF will catch standard violations and that of course may be adjusted according to what is being served from your API.
Hopefully this is of use, if not to the OP at least to other would-be visitors.

Related

How to identify browser vs. backend consumers of a REST service

The situation
There's a REST API which is currently consumed by other backend systems.
Now, that same REST API is going to be used by a single page application soon, and that SPA needs some additional security measures (CSRF token verification among others), but those additional security measures should be enforced only against end users running ordinary browsers, and not against other backends, so that those existing other backends keep working without any changes.
The Question
How do you distinguish between when a browser is consuming a REST API and when another backend is consuming it?
Is there a header that will be sent by any modern browsers and can't be turned off or can't be tampered with?
Maybe the User-Agent? Or do REST libs (in any language) send that too?
Or the Referer? Or Origin? Or some other headers?
Or something else other than a header?
Is there a header that will be sent by any modern browsers and can't be turned off or can't be tampered with?
As far as I know, you aren't going to find what you are looking for.
User-Agent is close
The "User-Agent" header field contains information about the user agent originating the request, which is often used by servers to help identify the scope of reported interoperability problems, to work around or tailor responses to avoid particular user agent limitations, and for analytics regarding browser or operating system use. A user agent SHOULD send a User-Agent field in each request unless specifically configured not to do so.
But it certainly isn't "tamper-proof"; it's just a text header, many user agents will allow you to customize it, etc.

Protecting REST API behind SPA against data thiefs

I am writing a REST Api gateway for an Angular SPA and I am confronted with the problem of securing the data exposed by the API for the SPA against "data thiefs". I am aware that I can't do much against HTML scraping, but at least I don't want to offer such data thiefs the user experience and full power of our JSON sent to the SPA.
The difference between most "tutorials" and threads about this topic is that I am exposing this data to a public website (which means no user authentication required) which offers valuable statistics about a video game.
My initial idea on how to protect the Rest API for SPA:
Using JWTs everywhere. When a visitor opens the website the very first time the SPA requests a JWT from my REST Api and saves it in the HTTPS cookies. For all requests the SPA has to use the JWT to get a response.
Problems with that approach
The data thief could simply request the oauth token from our endpoint as well. I have no chance to verify that the token has actually been requested from my SPA or from the data thief?
Even if I solved that the attacker could read the saved JWT from the HTTPS cookies and use it in his own application. Sure I could add time expiration for the JWT
My question:
I am under the impression that this is a common problem and therefore I am wondering if there are any good solutions to protect against others than the SPA having direct access to my REST Api responses?
From the API's point of view, your SPA is in no way different than any other client. You obviously can't include a secret in the SPA as it is sent to anybody and cannot be protected. Also the requests it makes to the API can be easily sniffed and copied by another client.
So in short, as diacussed many times here, you can't authenticate the client application. Anybody can create a different client if they want.
One thing you can actually do is checking the referer/origin of requests. If a client is running in a browser, thr requests it can make are somewhat limited, and one such limitation is the referer and origin headers, which are always controlled by the browser, and not javascript. So you can actually make sure that if (and only if!) the client is running in an unmodified browser, it is downloaded from your domain. This is the default in browsers btw, so if you are not sending CORS headers, you already did this (browsers do, actually). However, this does not keep an attacker from building and running a non-browser client and fake any referer or origin he likes, or just disregard the same origin policy.
Another thing you could do is changing the API regularly just enough to stop rogue clients from working (and changing your client at the same time ofc). Obviously this is not secure at all, but can be annoying enough for an attacker. If downloading all your data once is a concern, this again doesn't help at all.
Some real things you should consider though are:
Does anybody actually want to download your data? How much is it worth? Most of the times nobody wants to create a different client, and nobody is that much interested in the data.
If it is that interesting, you should implement user authentication at the very least, and cover the remaining risk either via points below and/or in your contracts legally.
You could implement throttling to not allow bulk downloading. For example if the typical user accesses 1 record every 5 seconds, and 10 altogether, you can build rules based on the client IP for example to reasonably limit user access. Note though that rate limiting must be based on a parameter the client can't modify arbitrarily, and without authentication, that's pretty much the client IP only, and you will face issues with users behind a NAT (ie. corporate networks for example).
Similarly, you can implement monitoring to discover if somebody is downloading more data than it would be normal or necessary. However, without user authentication, your only option will be to ban the client IP. So again it comes down to knowing who the user is, ie. authentication.

Secure communication between Web site and backend

I am currently implementing a Facebook Chat Extension which basically is just a web page displayed in a browser provided by the Facebook Messenger app. This web page communicates with a corporate backend over a REST API (implemented with Python/Flask). Communication is done via HTTPS.
My question: How to secure the communication the Web page and the backend in the sense that the backend cannot be accessed by any clients that we do not control?
I am new to the topic, and would like to avoid making beginners' mistakes or add too complicated protocols to our tech stack.
Short answer: You cant. Everything can be faked by i.e. curl and some scripting.
Slightly longer:
You can make it harder. Non browser clients have to implement everything you do to authenticate your app (like client side certificates and Signet requests) forcing them to reverse engineer every obfuscation you do.
The low hanging fruit is to use CORS and set the Access Allow Origin Header to your domain. Browsers will respect your setting and wont allow requests to your api (they do an options request to determine that.)
But then again a non official client could just use a proxy.
You can't be 100% sure that the given header data from the client is true. It's more about honesty and less about security. ("It's a feature - not a bug.")
Rather think about what could happen if someone uses your API in a malicious way (DDoS or data leak)? And how would he use it? There are probably patterns to recognize an attacker (like an unusual amount of requests).
After you analyzed this situation, you can find more information here about the right approach to secure your API: https://www.incapsula.com/blog/best-practices-for-securing-your-api.html

Securing REST API

I have a website which consumes the rest APIs exposed on the webserver.
This is content website and free to public. Thus, anybody can read the content by navigating on it (which call different REST APIs in the background). At the same time, I am worried that somebody could figure out my endponits from developer tools in the browser and call those (millions of times) to bring my server down. I need to secure my REST apis except from browsers. How do I go about this ?
I like to see the problem separated from your REST api.
Your api is a service on top of wich you'll want to build some security. Therefore, security is not actually affecting the design of your api.
One trivial thing to do is controlling the input flows. There are patterns associated with DOS or DDOS attacks that can be recognised to undertake counteractions. This is what Intrusion Prevention Systems (IPS) do.
Please, take a look here and, if you are more interested in something deeper, here.
If your requests are not authenticated (i.e. your apis are public) I think there's nothing more you can do.
You can secure your HTTP endpoints by using SHA-256 cryptographic algorithms. SHA-256 is more secure and fast enough to be used to secure HTTP calls. Here is a good example of how can you secure your API endpoints by signing the request and response using SHA-256.

Cache-control policy for search resource on RESTful API

I'm creating a RESTful API (using MVC.NET) to allow external access to a business system. The API includes a search resource. The resource takes the URI form "/example/search/pages/1/?query=something".
Example: To search for pizza you would access the URI "/example/search/pages/1/?query=pizza" which would give you the first 10 results. To get the second page of results you would request "/example/search/pages/2/?query=something" etc.
I've used the cache-control HTTP header to enable public caching of all the resources on the API with the aim of dramatically reducing the load on the server(s) serving the API web app.
However I'm not sure what caching policy to use for the search resource. As the resource (and it's URI) vary depending on what you search for, there's seems little point caching the page. What caching policy (i.e. caching via the cache-control HTTP header) do people recommend for search resources on RESTful APIs? No caching? Private caching with a very short expiry time? Public caching with short expiry?
Most proxy will not cache anything that uses a querystring.
If you want caching, I'd suggest crafting new URIs for your search request using a POST-Redirect-GET pattern.
POST search
Content-Type: application/x-www-form-urlencoded
term=something
303 See Other
Location: /search/something/1
This will enable caching more agressively, but you'll have to craft those URIs and will still get hit by the initial POST. That said, if it's the query that's problematic, this will solve the problem nicely.
public caching with an appropriate max-age is what you want for this - the value of max-age will be application specific and is a subjective judgement call you have to make.
You have to balance the risk of serving stale responses against the reward of not having to compute every request.. If this risk is extremely high then shorten the time - but just be conscious that by doing this you are increasing load of your origin server. It's a good idea to monitor usage patterns and server loads in order to establish that your initial judgement is correct.
This wasn't part of your question but if I was you I would move the pagination into the query part of the URI, so
/example/search/pages/1/?query=something
would become:
/example/search?term=something&page=1
It's not essential but it will be more intuitive for developers, and you can hit it easier with an HTML form