I would like to use Nginx as my CDN for a file hosting system. I saw a great module for nginx that allows postgres connection (https://github.com/FRiCKLE/ngx_postgres) it works really well, however when I try to use it while having alias directive it seems to ignore the alias or file download and rather give me an empty file.
My idea is, to use the UUID from the URL and find the correct file doing a query and then using the found details to change the filename header so that the user's client will download automatically set the name to the original filename instead of a uuid.
Here is the code.
location /dl{
postgres_output none;
postgres_pass database;
postgres_query "SELECT * FROM \"Files\" WHERE uuid = '$args'";
postgres_set $filename 0 name;
alias /home/ubuntu/fileStorage;
add_header Content-Disposition "attachment; filename=$filename";
}
I think somehow the postgres directive is locking up this block. Is there a way I can run the postgres query without effecting the download block?
It seem that you expect that the line
add_header Content-Disposition "attachment; filename=$filename";
will cause the browser to download the file given by $filename. This is not how the Content-Disposition header works, it simply tells the browser to interpret the response body as a file. You're going to have to do something additional to get the proper content to the client. Perhaps what you really want is to issue a redirect?
Related
Trying to do a rewrite and redirect. I've been trying this, it works to some extent but not 100% what I want it to do
acl old url_beg /site/ab
http-request redirect location /new/%[query] if old
the url can be for example
https://host/site/ab/xx
https://host/site/ab/yyyy
https://host/site/ab/zzzzzz
https://host/site/ab/zzzzzz/asdajshdjasd
I am looking to grab the bold marked part and simply redirect the user to https//host/new/boldmarkedpart
Any string that comes after the bold marked part can be trashed. For example "/asdajshdjasd" in the last example.
Any idea how to accomplish this? Thank you!!
If i understand correctly, you want to split the path part of url and get its 4th part.
In string foo1/foo2/foo3/foo4/foo5 you want only foo4.
This should work for you:
acl old path_beg /site/ab
http-request redirect location /new/%[path,field(4,/)] if old
It may be confusing that you want 3rd directory from path and here you take 4th word, but that's because when you split /foo2/foo3/foo4/foo5 by / then the first word is empty.
field converter is documented here: https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#7.3.1-field
Other notes:
%[query] would return the query part of url, which is everything after ? character and you don't have query part at all in your examples.
url in my tests had schema://hostname:port/path, so testing acl old url_beg /site/ab never matched, path is for that
I created my own “404 Page not found” error page on a TYPO3 website and implemented it via the /typo3conf/LocalConfiguration.php as follows, using the page’s Speaking URL path:
return [
...
'FE' => [
...
'pageNotFound_handling' => '/page-not-found/',
]
]
Now when I call a non-existing page, the error page gets displayed but there is a 4-digit alphanumeric number (hexadecimal as far as I’ve seen by now) BEFORE the HTML source code and a “0” AFTER it. Example (the number in the beginning is different after most of the reloads):
37b3
<!DOCTYPE html>
...
</html>
0
When calling the error page URL itself the page is returned correctly without those numbers.
Having the RealURL extension activated or deactivated does not make a difference.
Thanks a lot in advance!
I added the full description from the install tool and I guess we might find the solution there.
How TYPO3 should handle requests for non-existing/accessible pages.
empty (default)
The next visible page upwards in the page tree is shown.
'true' or '1'
An error message is shown.
String
Static HTML file to show (reads content and outputs with correct headers), e.g. notfound.html or http://www.example.org/errors/notfound.html.
Prefix "REDIRECT:"
If prefixed with "REDIRECT:" it will redirect to the URL/script after the prefix.
Prefix "READFILE:"
If prefixed with "READFILE" then it will expect the remaining string to be a HTML file which will be read and outputted directly after having the marker "###CURRENT_URL###" substituted with REQUEST_URI and ###REASON### with reason text, for example: READFILE:fileadmin/notfound.html.
Prefix "USER_FUNCTION:"
If prefixed with "USER_FUNCTION:" a user function is called, e.g. USER_FUNCTION:fileadmin/class.user_notfound.php:user_notFound->pageNotFound where the file must contain a class user_notFound with a method pageNotFound() inside with two parameters $param and $ref.
What you configured:
You're passing a string, thus TYPO3 expects to find a file - which you don't have, because it's more like an URL.
From what you try to achieve I'd go with REDIRECT:/page-not-found/.
Thanks for pointing this one out btw, I will remove the string configuration from the core since it does not make sense to have more people trip into this pitfall.
In short: change the following line in the FE section of your LocalConfiguration.php:
'pageNotFound_handling' => '/your404page.html',
to
'pageNotFound_handling' => 'REDIRECT:/your404page.html',
Cause
The actual cause is a combination of chunked Content-Encoding and the TYPO3 not being able to decode that in some cases. In your case the page not found handler eventually uses GeneralUtility::getUrl() to retrieve the error page.
If you have [SYS][curlUse] enabled it will use cUrl to retrieve the page and there is no problem.
If you don't have [SYS][curlUse] enabled it will open a socket, read the headers and then read the rest of the body. If the webserver uses "chunked" Content-Encoding the body will contain blocks of data and each block starts with a line with the length in hexadecimal format. The content ends with an empty block (with of course a line with the length "0").
cUrl apparently knows how to decode chunked data.
getUrl() itself does not know how to handle chunked data and uses the content as is as the page content.
In TYPO3 8 LTS the guzzle library is used to handle HTTP requests. In the guzzle code I can't find anything about handling chunked data. Guzzle will check if the cUrl PHP extension is present and use that as preferred transport. In most installations cUrl is present and since this decodes chunked data automagically no problem is visible. I have to test guzzle with PHP that has cUrl disabled to see if the issue is also present in v8/master.
Workaround/solution
If the PHP extension cUrl is enabled in your installation you can simply set [SYS][curlUse] in the Install Tool. The numbers around the 404 page content will disappear.
I am creating an HTTP client downloader in Python. I am able to correctly download a file such as http://www.google.com/images/srpr/logo11w.png just fine. However, I'm not sure what to actually name the thing.
There is of course the filename at the end of the URL, but is this always reliable?
If I recall correctly, wget uses the following heuristic:
If a Content-Disposition header exists, get the filename from there.
If the filename component of the URL exists (e.g. http://myserver/filename), use that.
If there is no filename component (e.g. http://www.google.com), derive the filename from the Content-Type header (such as index.html for text/html)
In all cases, if this filename is already present in the directory use a numerical suffix, such as index (1).html, or overwrite, depending on configuration.
There are plenty of other flags that control other heuristics, such as creating .html for ASP/DHTML content-types.
In short, it really depends how far you want to go. For most people, doing the first two + basic Content-Type->name mapping should be enough.
Are we allowed to link files directly from Github ?
<link rel="stylesheet" href="https://raw.github.com/username/project/master/style.css"/>
<script src="https://raw.github.com/username/project/master/script.js"></script>
I know this is allowed on Google Code. This way I don't have to worry about updating a local file.
The great service RawGit was already mentioned, but I'll throw another into the ring: GitCDN.link
Benefits:
Lets you link to specific commits, as well as auto-get the latest (aka master)
Incurs no damage from high traffic volumes; RawGit asks that it's dev.rawgit.com links be only used during development, where as GitCDN give you access to the latest version, without the danger of the servers exploding
Give you the option of auto minifying your HTML, CSS and JavaScript, or serving it as written (https://min.gitcdn.link).
Adds compression (GZip)
Adds all the correct headers (Content-Type, cache-control, e-tag, etc)
Full disclosure, I'm a project maintainer at GitCDN.link
You can use external server rawgithub.com. Just remove a dot between words 'raw' and 'github' https://raw.github.com/.. => https://rawgithub.com/ and use it. More info you find in this question.
However, according to the rawgithub website it will be shutting down at the end of October 2019.
You can link directly to raw files, but it's best not to do it since the raw files always get sent with a plain/text header and can cause loading problems.
You need carry out the following steps
Get the raw url of the file from github. Which is something like https://raw.githubusercontent.com/username/folder/example.css
Visit http://rawgit.com/. Paste the git url above in the input box. It will generate two url's, one for development and other for production purpose.
Copy any one of them and you are done.
The file will act as a CDN. You can also use gist urls.
GitHub Pages: https://yourusername.github.io/script.js
GitHub repo raw files: https://github.com/yourusername/yourusername.github.io/blob/master/script.js
Use GitHub Pages, DO NOT use raw files.
Reason:
GitHub Pages are based on CDN, raw files are not. Accessing raw files will directly hit on GitHub servers and increase server load.
Add a branch your project using the name "gh-pages" and then you'll (shortly after branching) be able to use a direct URL such as https://username.github.io/project/master/style.css (using your URL, and assuming "style.css" is a file in the "master" folder in the root of your "project" repository...and that your Github account is "username").
For those who ended up in this post and just want to get the raw link from an image in GitHub:
If it is the case of an image, you can just add '?raw=true' at the end of the link to the file.
E.g.
Original link:
https://github.com/githubusername/repo_name/blob/master/20160309_212617-1.png
Raw link:
https://github.com/githubusername/repo_name/blob/master/20160309_212617-1.png?raw=true
Use jsdelivr.com
Copied directly from https://www.jsdelivr.com/?docs=gh:
load any GitHub release, commit, or branch
note: we recommend using npm for projects that support it
https://cdn.jsdelivr.net/gh/user/repo#version/file
load jQuery v3.2.1
https://cdn.jsdelivr.net/gh/jquery/jquery#3.2.1/dist/jquery.min.js
use a version range instead of a specific version
https://cdn.jsdelivr.net/gh/jquery/jquery#3.2/dist/jquery.min.js
https://cdn.jsdelivr.net/gh/jquery/jquery#3/dist/jquery.min.js
omit the version completely to get the latest one
you should NOT use this in production
https://cdn.jsdelivr.net/gh/jquery/jquery/dist/jquery.min.js
add ".min" to any JS/CSS file to get a minified version
if one doesn't exist, we'll generate it for you
https://cdn.jsdelivr.net/gh/jquery/jquery#3.2.1/src/core.min.js
add / at the end to get a directory listing
https://cdn.jsdelivr.net/gh/jquery/jquery/
After searching for this same functionality, I ended up writing my own PHP script to act as a proxy. The trouble I kept running into is even when you get the RAW version/link from Github and link to it in your own page, the header sent over was 'text/plain' and Chrome was not executing my JavaScript file from Github. I also didn't like the other links posted for using third party services because of the obvious security/tampering issues possible.
So using this script, I can pass over the RAW link from Github, have the script set the correct headers, and then output the file as if it were coming from my own server. This script can also be used with a secure application to pull in non-secure scripts without throwing SSL errors warning of "Non-secure links used".
Linking:
<script src="proxy.php?link=https://raw.githubusercontent.com/UserName/repo/master/my_script.js"></script>
proxy.php
<?php
###################################################################################################################
#
# This script can take two URL variables
#
# "type"
# OPTIONAL
# STRING
# Sets the type of file that is output
#
# "link"
# REQUIRED
# STRING
# The link to grab and output through this proxy script
#
###################################################################################################################
# First we need to set the headers for the output file
# So check to see if the type is specified first and if so, then set according to what is being requested
if(isset($_GET['type']) && $_GET['type'] != ''){
switch($_GET['type']){
case 'css':
header('Content-Type: text/css');
break;
case 'js':
header('Content-Type: text/javascript');
break;
case 'json':
header('Content-Type: application/json');
break;
case 'rss':
header('Content-Type: application/rss+xml; charset=ISO-8859-1');
break;
case 'xml':
header('Content-Type: text/xml');
break;
default:
header('Content-Type: text/plain');
break;
}
# Otherwise, try and determine what file type should be output by the file extension from the link
}else{
# See if we can find a file type in the link specified and set the headers accordingly
# If css file extension is found, then set the headers to css format
if(strstr($_GET['link'], '.css') != FALSE){
header('Content-Type: text/css');
# If javascript file extension is found, then set the headers to javascript format
}elseif(strstr($_GET['link'], '.js') != FALSE){
header('Content-Type: text/javascript');
# If json file extension is found, then set the headers to json format
}elseif(strstr($_GET['link'], '.json') != FALSE){
header('Content-Type: application/json');
# If rss file extension is found, then set the headers to rss format
}elseif(strstr($_GET['link'], '.rss') != FALSE){
header('Content-Type: application/rss+xml; charset=ISO-8859-1');
# If css xml extension is found, then set the headers to xml format
}elseif(strstr($_GET['link'], '.xml') != FALSE){
header('Content-Type: text/xml');
# If we still haven't found a suitable file extension, then just set the headers to plain text format
}else{
header('Content-Type: text/plain');
}
}
# Now get the contents of our page we're wanting
$contents = file_get_contents($_GET['link']);
# And finally, spit everything out
echo $contents;
?>
If your webserver has active allow_url_include, GitHub serving the files as raw plain/text is not a problem since you can include the file first in a PHP script and modify its Headers to the proper MIME type.
I think this is a very easy one, but I can't seem to get it right. Basically, I'm trying to use Rack middleware to set a default Cache-Control header into all responses served by my Sinatra app. It looks like Rack::ResponseHeaders should be able to do exactly what I need, but I get an error when attempting to use the syntax demonstrated here in my rackup file:
use Rack::ResponseHeaders do |headers|
headers['X-Foo'] = 'bar'
headers.delete('X-Baz')
end
I was able to get Rack::Cache to work successfully as follows:
use Rack::Cache,
:default_ttl => 3600
However, this doesn't achieve exactly the output I want, whereas Rack::ResponseHeaders gives fine-grained control of the headers.
FYI, my site is hosted on Heroku, and the required Rack gems are specified in my .gems manifest.
Thanks!
Update: After doing some research, it looks like the first issue is that Rack::ResponseHeaders is not found in the version of rack-contrib (0.9.2) which was installed. I'll start by looking into that.
In case anyone's interested, I was able to get this working. It didn't look like there would be an easy way to install rack-contrib-0.9.3 on Heroku, but the only file I needed was response_headers.rb, so I simply copied this into my project directory and edited my rackup as follows:
require 'rack/contrib/response_headers'
# set default cache-control header if not set by Sinatra
use Rack::ResponseHeaders do |headers|
if not headers['Cache-Control']
headers['Cache-Control'] = "public, max-age=3600"
end
end
This sets a default max-age of 1 hr on objects for which I'm not specifying an explicit Cache-Control header in Sinatra – namely, static assets.