I need to set the caching duration for the pdf documents as 1 hour. So after every hour, the PDFs get refreshed. I found over the internet that we can use cache control-max age header as below-
Cache-Control: max-age=3600
so that it will tell cloudfront to keep the PDFs in the cache for 3600 seconds(1 hour).
But I am not sure where to put this code. Do I need to put this in the dispatcher? If yes, how? Can anyone please provide some code snippet for the same?
Also, we have included "expires.rules" file in the dispatcher which has the below code-
ExpiresActive on
ExpiresDefault "access plus 1 month"
Header append Cache-Control "public"
Header add X-ServiceProvider "Test"
#PDF
ExpiresByType application/pdf "access plus 1 hour"
Is it doing the same thing as max-age header?
It will be really helpful if someone can explain this.
Thanks!
The standard approach is to set the cache headers in the Apaches VirutalHost definition. Normally the settings differ by file-type and path. Also make sure that you differentiate between author and publisher.
Here are some examples
# Cache JS+CSS with MD5 Hash for 30 days
SetEnvIf Request_URI "^.*(\.min)?\.[a-f0-9]{25,32}\.(js|css)$" immutable_resource=true
Header set Cache-Control "public, max-age=2592000" env=immutable_resource
# Cache Images for 30 days
SetEnvIf Request_URI "^/(etc|content)/.*\.(svg|png|gif|jpeg|jpg)$" image_resource=true
Header set Cache-Control "public, max-age=2592000" env=image_resource
# Cache Fonts for 30 days
SetEnvIf Request_URI "^/etc/.*\.(eot|ttf|woff|woff2)$" font_resource=true
Header set Cache-Control "public, max-age=2592000" env=font_resource
# Cache HTML documents for 2 hours (in this example everything is served with /content/...)
SetEnvIf Request_URI "^/content/myproject/.*\.html$" html_document=true
# Treat vanity URLs as HTML documents too
SetEnvIf Request_URI "^/[A-Za-z0-9]+(\.html)?$" html_document=true
Header set Cache-Control "public, max-age=7200" env=html_document
Related
I understand that I can use url_param / urlp to extract the query parameters from the URL that is requested, in HAProxy.
However, I need similar function for extracting parameters from the URL sent as HTTP Header field Referer. I guess url_param is only available for the requested URL, and not possible to use for HTTP Header values? If so, what other options do I have? I need to retrieve the value from query parameter and send it as specific HTTP Header to the backend server.
Sharing my solution (although Im not sure this is the most efficient and accurate way). I solved it with Regex.
# Example HTTP Referer: http://myexample.com/users?user-id=12345
# ACL
acl is_uid_in_hdr_referer hdr_sub(Referer) -i user-id
# Set value from query param "user-id" from Referer header to custom header "user-id"
http-request set-header user-id %[req.hdr(Referer),regsub(.+?user-id=,,g)] if is_uid_in_hdr_referer
Currently we are using Haproxy as a software loadbalancer.
I have an assignment, where I need to inspect each and request coming into my application and I need to look for a specific header (let's say Accept header) and I need to modify the value of header from A --> B.
Could you please guide me how can I do this by using HAPROXY.
Regards,
-Srini.
To replace one request header with another, example:
Accept: application/json # existing value
Accept: application/xml # desired value
Test the current value then set a header with the desired header.
http-request set-header Accept application/xml if { hdr(accept) -m str application/json }
Using http-request set-header removes any/all existing headers with the same name, which is what you would wanrtin this case. Using -m str specifies a case-sensitive string match on the value. Header name matching is always case-insensitive.
http://cbonte.github.io/haproxy-dconv/1.6/configuration.html#4-http-request
I want to be able to detect when a response from the backend contains this:
testCookie="ASF#ED124312FASdf23er="; Version=1; Max-Age=0; Expires=Thu, 01 Jan 1970 00:00:10 GMT; Path=/; Secure
What I want to detect is that the response is going to set the expiration date to 01 Jan 1970 for the cookie called testCookie.
The challenge is in the fact that haproxy when using a fetch method to get a header, will consider commas as the delimiter for separate values. and the alternative proposed, which is to use fhdr to fetch the complete header indeed works but will not enable you to filter for a specific Set-Cookie header via a regex filter or something, so you have to rely on a fixed index of the Set-Cookie header to be fetched....and that is not acceptable obviously.
So, I have tried:
ACL with regex
acl is_expiration hdr_reg(Set-Cookie) <enter_regex_here>
This has the downside that the comma in the expiration date will cause haproxy to parse that value in 2 separate parts so you can either match the first part which gives you the correct cookie identifier or the second part which has the expiration date.
Full Header with var and separate acl
http-response set-var(txn.temp_var) res.fhdr(Set-Cookie)
acl is_expiration var(txn.temp_var) -m sub 01\ Jan\ 1970
This will store the whole text in the Set-Cookie header in a temporary value and then I can do a substring match on that, or a regular expression but I still run into the problem of the comma and this is approach is error prone because it will take the first Set-Cookie header found or one found at a specific index I can specify.
Workaround
I could match by the Max-Age but if the server implementation changes and Expires and Max-Age switch places, I am screwed.
acl is_expiration hdr_reg(Set-Cookie) testCookie.*Max-Age=0
So.....heeeeelp!!!
After getting my head a bit out of the issue itself, I found an answer showing how the synthax should look like for an ACL using the full header fetch command.
Bottom line....the documentation is kind of quirky and the solution looks like this:
acl is_expiration res.fhdr(Set-Cookie) -m reg testCookie.*Expires=Thu,\ 01\ Jan\ 1970
I am to apply Cache-Control: must-revalidate,no-cache,no-store to all responses from out backend REST services. I have two questions about it:
Is it common to do so? For some reason I was under the impression that it's not necessary, but I have no source to back this claim (yet).
Is the value I mentioned above really sufficient, or should I set more?
Edit: found this: https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers#cache-prevention. Is says browsers may choose to cache when nothing is explicitly configured, so it means yes, it should be configured if I want to make sure cache is disabled.
Short: yes, caches may cache the response even if no explicit controls are present, you need to explicitly disallow it.
The HTTP caching specification Section 3 lists when the response is forbidden to be cached. It suggests that the response may be cached as long as the response code is cacheable. A list of cacheable response codes is in the HTTP specification section 6.1:
Responses with status codes that are defined as cacheable by default
(e.g., 200, 203, 204, 206, 300, 301, 404, 405, 410, 414, and 501 in
this specification) can be reused by a cache with heuristic
expiration unless otherwise indicated by the method definition or
explicit cache controls...
"Heuristic expiration" is defined as the expiration time assigned when no explicit controls are present. (HTTP caching specification section 4.2.)
I think it's disabled by default. There are mechanisms though to enable caching to enhance performance:
Here's a good explanation with examples how to enable caching:
[Source: Heroku Dev Center]
Time-based cache headers
In HTTP 1.1 the Cache-Control header specifies the resource caching behavior as well as the max age the resource can be cached. As an example, this response would be cached for one day:
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: private, max-age=86400
Last-Modified: Thu, 07 Feb 2013 11:56 EST
Here is a list of all the available Cache-Control tokens and their
meaning:
private only clients (mostly the browser) and no one else in
the chain (like a proxy) should cache this
public any entity in the chain can cache this
no-cache should not be cached anyway
no-store can be cached but should not be stored on disk (most browsers will hold the resources in memory until they will be quit)
no-transform the resource should not be modified (for example shrink image by proxy)
max-age how long the resource is valid (measured in seconds)
s-maxage same like max-age but this value is just for non clients
And here an example with CDI Cache Control annotation:
[Source: abhirockzz.wordpress.com]
#Path("/testcache")
public class RESTfulResource {
#Inject
#CachControlConfig(maxAge = 20)
CacheControl cc;
#GET
#Produces("text/plain")
public Response find() {
return Response.ok(UUID.randomUUID().toString()).cacheControl(cc).build();
}
}
I'm trying to redirect a url in nginx to a non http protocol such as test://123456 when i go to test.com/123456
I've tried the following rewrite:
rewrite ^/(.*)$ test://$1 permanent;
and it works however the weird part that it adds html/body headers which mess up my code, is there any way to do without the html headers or any other recommended way to to such rewrite?
HTTP/1.1 301 Moved Permanently
Server: nginx/1.1.19
Date: Tue, 30 Apr 2013 14:14:47 GMT
Content-Type: text/html
Content-Length: 185
Connection: keep-alive
Location: test://123456
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.1.19</center>
</body>
</html>
This is not weird, this is how it’s supposed to be.
RFC 2616 specifies that the entity bodies you want to remove should be present.
10.3.2 301 Moved Permanently
The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).
and...
10.3.3 302 Found
The temporary URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).
SHOULD, in this context, is defined in RFC 2119:
This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
Answer taken from NGINX 301 and 302 serving small nginx document body. Any way to remove this behaviour?
Of course you can still do it, one possibility would be to proxy the request and change the requests method from GET to HEAD. That should ensure that only HTTP headers are sent.
This is untested, but it should be a good starting point:
server {
listen 8080;
server_name localhost;
return 301 test://$request_uri;
}
server {
listen 80;
server_name example.com;
location / {
proxy_method HEAD;
proxy_pass http://localhost:8080;
}
}
Also from interest in this context NGINX convert HEAD to GET requests.