I installed ARR on my local machine and setup a server farm with a single server in it (localhost). I added two redirect routing rules. However, it doesn't do the redirect. My Default Web Site has ab additional binding like this one: localhost.mycompany.com. I tried putting that in the server farm and it still didn't work. The redirect rules look like this.
Uses wildcards in the pattern
inbound pattern: */path2/*/*/*/method*
Redirect URL: /path1/path2/api/item/method
EDIT: When I use the Test Pattern and enter one of the URLs against my rule it parses it successfully
Also tried putting the full hostname (e.g. http://localhost.mycompany.com/...) in the redirect rule as well as using the alias localServerFarm (which is the name of server farm). Nothing worked.
The module is "working" in some respect because when I had a broken rule it sure told me about it when I tried to load any url on localhost. Once I fixed the rule, I no longer got the error message but it doesn't do any redirection.
This was just a matter of getting the redirect rule correct. In the rules list there is a column named Input and it's setting is URL Path. So, the only input to the pattern match is the path part of the URL not including the / at the beginning. All I had to do was change the */ at the beginning of my pattern to just *, e.g. */path2/*/*/*/method* changed to *path2/*/*/*/method*.
I don't know if there's any other setting for the Input field (it isn't settable in the rule definition screen) but for anyone creating rules remember that only the path without a leading / is what's used for evaluating the pattern match. One note is that if you're matching from the beginning of the path, as I am, you don't need the * at the beginning of the pattern. However, if you go into the test pattern screen and paste a full URL into the Input data it will not just grab the path part of that URL and feed it to the pattern match will use the entire string so it will require an * at the beginning of your pattern to work.
I need to place a Mojolicious app behind an Apache reverse proxy. I've been unable to get Mojolicious to generate working URLs when behind the proxy.
I'm using Mojolicious 6.14 with Perl 5.18.1.
Here's my Apache reverse proxy configuration which I set based on https://github.com/kraih/mojo/wiki/Apache-deployment (in the path section).
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
ProxyRequests Off
ProxyPreserveHost On
ProxyPass /app1 http://localhost:3000/ keepalive=On
ProxyPassReverse /app1 http://localhost:3000/
RequestHeader set X-Forwarded-HTTPS "0"
Here's my test case.
use 5.014;
use Mojolicious::Lite;
app->hook('before_dispatch' => sub {
my $self = shift;
if ($self->req->headers->header('X-Forwarded-Host')) {
#Proxy Path setting
my $path = shift #{$self->req->url->path->parts};
push #{$self->req->url->base->path->parts}, $path;
}
});
any '/' => sub {
my $c = shift;
$c->render('index');
};
any '/test' => sub {
my $c = shift;
$c->render('test');
};
app->start;
__DATA__
## index.html.ep
<!DOCTYPE html>
<html>
<head><title>Index Page</title></head>
<body>
<p>Index page</p>
<p>
%= link_to 'Go to Test Page' => '/test'
</p>
</body>
</html>
## test.html.ep
<!DOCTYPE html>
<html>
<head><title>Test Page</title></head>
<body>
<p>Test page</p>
<p>
%= link_to 'Return to home page' => '/'
</p>
</body>
</html>
I can see the index page when I access http://www.example.com/app1, but the link to the test page is incorrect. The link is //test when I expected it to be http://www.example.com/app1/test.
Here's the HTML output from the test case.
<!DOCTYPE html>
<html>
<head><title>Index Page</title></head>
<body>
<p>Index page</p>
<p>
Go to Test Page
</p>
</body>
</html>
How can I tell Mojolicious what the base URL is for my app so it generates the correct links?
Maybe need to replace server proxy pass on http://localhost:3000/app1 in apache config:
ProxyPass /app1 http://localhost:3000/app1 keepalive=On
ProxyPassReverse /app1 http://localhost:3000/app1
That's a good question! Judging by some of the answers here and elsewhere, there seems to be a general lack of understanding of how the Apache reverse proxy settings affect the Mojolicious application and what the hook is supposed to do.
You've received an answer that's basically correct, but it begins with "Maybe [need to replace server proxy pass..." and it doesn't provide any explanation. A trial-and-error approach may or may not work for you. If your hook works differently, it probably won't.
Apache reverse proxy
This is your reverse proxy configuration (trailing slash removed, see below):
ProxyPass /app1 http://localhost:3000 keepalive=On
Quoting from the Apache documentation:
Suppose the local server has address http://example.com/; then
| ProxyPass /mirror/foo/ http://backend.example.com/
will cause a local request for http://example.com/mirror/foo/bar to be internally converted into a proxy request to http://backend.example.com/bar.
Now, assuming your Apache is listening on localhost:80 and your (Morbo) application server is listening on port 3000, a request to http://localhost/app1 is received by the Apache and forwarded to your application as /. The app1 prefix has been lost, which is why it's missing from the base url, i.e., it's missing in all the links. To fix urls generated by the application, this prefix must be added to the base url, which leads us to the hook.
Hook
This is your hook function:
if ($self->req->headers->header('X-Forwarded-Host')) { # 1. if
my $path = shift #{$self->req->url->path->parts}; # 2. shift
push #{$self->req->url->base->path->parts}, $path; # 3. push
}
This hook is supposed to fix the base url. As explained above, the app1 prefix needs to be added to the base url, which is prepended to all generated urls. If one of your templates links to /test, the base url should look like /app1 to get the final url /app1/test.
This is what your hook does:
By checking for the X-Forwarded-Host, you make sure to only modify the base url if the request came through the reverse proxy. This works because the mod_proxy_http module (documentation) automatically sets that header. Without that check, you wouldn't be able to access your application server directly under localhost:3000, all urls would be broken.
In fact, I asked the question on how this distinction should be made in a reliable way to fix the url prefix when using a reverse proxy without breaking requests going to the application server. Unfortunately, most of the answers I have received are wrong. But I believe checking for X-Forwarded-Host is good enough as it's set by Apache and not by Morbo or Hypnotoad. In fact, it's set by reverse proxies, which is precisely what you're looking for.
This shift is supposed to extract the prefix from the request url.
This is necessary because, strictly speaking, appending the application prefix to the ProxyPass directive manipulates the final request url, so your application receives a request for /app1/. Of course, there's no route for that address because the router in your application doesn't know that /app1 is the prefix of that instance rather than a relative application url.
Clearly, adding the hard-coded prefix /app1 to all templates (as some might be tempted to do) would not work if you deployed another copy of the same application under /app2. Even if you didn't, you'd still have to change all the links if your provider forces you to change the app1 prefix to app_one. This is why the prefix is picked up in that hook, stored to make links work (see #3) and then removed from the request url to make the router happy.
This is where the /app1 prefix, a single path token, is appended to the base url. The base url is prepended to urls generated in your templates. This is what turns /test into /app1/test (if the request came through the reverse proxy).
In your case, /test is turned into //test because you're missing the prefix. I've explained that at the end of this answer.
Fix reverse proxy
That being said, your reverse proxy needs to manipulate the request url to include the prefix in order to make the hook work:
ProxyPass /app1 http://localhost:3000/app1
After this modification, your hook works:
It modifies the base url only if a reverse proxy header is set because the modification is only necessary when a reverse proxy is used.
All requests going to the Mojolicious application will have the /app1 prefix, e.g., /app1/test. In this step, the prefix is removed to turn the url into /test.
The prefix removed in step 2 is appended to the base url, which is later used to generate links.
This should explain why you need to add the application prefix to the ProxyPass line. Without that explanation, someone else might try to do just that without success because they might have a different hook function.
Slashes
A single slash can break everything and cause most requests to fail with error 404.
Note that the local target url in your ProxyPass line (second argument) has a trailing slash but the path argument doesn't. If those don't match, you might end up with double slashes in the request url and some requests could fail.
From the Apache documentation:
If the first argument ends with a trailing /, the second argument should also end with a trailing /, and vice versa. Otherwise, the resulting requests to the backend may miss some needed slashes and do not deliver the expected results.
Now, if you remove the trailing slash but forget the prefix...
ProxyPass /app1 http://localhost:3000
... generated urls will still have two leading slashes: url_for '/test' = //test
That's because you're appending undef to the base url where you want to append the application prefix.
What happens is that in step 2 (see above), you extract the prefix, assuming the application is running exactly one level below the document root, i.e., your prefix is something like app1 and not apps/app1 (in which case the shift/push routine has to be run twice). But there's no prefix in the ProxyPass directive, so your application sees something like /, in other words, there's nothing to extract from parts. And there's no safeguard in the code either, so you end up pushing undef to the parts array of the base url. If you then generate a url, Mojolicious is adding an extra slash for that undef element, which is why you get //test. The parts array looks like this:
"parts" => [
undef,
"test"
],
To fix this double slash error, you can add a safeguard to your hook:
my $path = shift #{$self->req->url->path->parts};
if ($path) { # safeguard
push #{$self->req->url->base->path->parts}, $path;
}
Of course, as long as your reverse proxy configuration has the prefix in it, $path should always be defined.
One could certainly argue that this approach is a hack because it manipulates the url. Hacks tend to fail under certain circumstances. In this case, it would fail if you were to manually set the X-Forwarded-Host while accessing the application server directly. I mentioned that in my question. But as developer, you're probably the only person who has direct access to that application server, as the firewall would only allow external requests to the reverse proxy in a typical production environment. I'll leave it at that.
Somehow a workaround, but this is how I solved this problem:
In the configuration-file (.conf) I define the the base-url:
base_url => 'https://booking.business-apartments.wien',
This allows me to write templates like this:
%= link_to 'Payment Information' => ( config('base_url') . url_for('intern/invoice/list_payments/') );
May you should try to update your Mojolicious to a newer version? I remember I had a similar problem and I solved it with a code where I explicitly defined the url for the proxy and appended it to every request (similar to lanti's answer). After some mojolicious update the code was not necessary anymore.
Moreover, I think I use the same config as proposed by Logioniz.
When you mount your app into some point (not root / ) it is curious to get working it. Look at Mojolicious::Controller::url_for
# Make path absolute
my $base_path = $base->path;
unshift #{$path->parts}, #{$base_path->parts};
$base_path->parts([])->trailing_slash(0);
Here you can control what is generated.
I am creating an HTTP client downloader in Python. I am able to correctly download a file such as http://www.google.com/images/srpr/logo11w.png just fine. However, I'm not sure what to actually name the thing.
There is of course the filename at the end of the URL, but is this always reliable?
If I recall correctly, wget uses the following heuristic:
If a Content-Disposition header exists, get the filename from there.
If the filename component of the URL exists (e.g. http://myserver/filename), use that.
If there is no filename component (e.g. http://www.google.com), derive the filename from the Content-Type header (such as index.html for text/html)
In all cases, if this filename is already present in the directory use a numerical suffix, such as index (1).html, or overwrite, depending on configuration.
There are plenty of other flags that control other heuristics, such as creating .html for ASP/DHTML content-types.
In short, it really depends how far you want to go. For most people, doing the first two + basic Content-Type->name mapping should be enough.
i have a url similar to this
http://one/two/three:four&five=six|seven
also i have
Zend_Uri_Http::setConfig(array('allow_unwise' => true));
in order to be able to use "|". when i try to use
Zend_Http_Client::setUri()
on my url, i get
Zend_Uri_Exception: Invalid URI supplied
when i hit the url from the browser, it works. how to avoid this problem. any ideas are welcome
The URL will be valid if you change it to:
http://one/two/three:four?five=six|seven
What is supposed to be the query string in that URI? You have to separate the query string from the path by ? before you can use & to separate arguments.
so it turns out i had a rewrite rule, defined in my httpd-vhosts.conf file, which created valid url after my invalid url was hit. since i needed to hit the url from within the zend framework phpunit test, i applied rewrite rule manually and got the correct url
moral of the story, put all the relevant info into your question, or else, nobody will be able to help you out
Can anybody tell me sample code to check if an url has been blocked by robots.txt?
We can specify full url or directory in the robots.txt.
Is there any helper function in Perl?
Check out WWW::RobotRules:
The following methods are provided:
$rules = WWW::RobotRules->new($robot_name)
This is the constructor for WWW::RobotRules objects. The first
argument given to new() is the name of the robot.
$rules->parse($robot_txt_url, $content, $fresh_until)
The parse() method takes as arguments the URL that was used to
retrieve the /robots.txt file, and the contents of the file.
$rules->allowed($uri)
Returns TRUE if this robot is allowed to retrieve this URL.
WWW::RobotRules is the standard class for parsing robots.txt files and then checking URLs to see if they're blocked.
You may also be interested in LWP::RobotUA, which integrates that into LWP::UserAgent, automatically fetching and checking robots.txt files as needed.
Load the robots.txt file and search for "Disallow:" in the file. Then check if the following pattern (after the Disallow:) is within your URL. If so, the URL is banned by the robots.txt
Example -
You find the following line in the robots.txt:
Disallow: /cgi-bin/
Now remove the "Disallow: " and check, if "/cgi-bin/" (the remaining part) is directly after the TLD.
If your URL looks like:
www.stackoverflow.com/cgi-bin/somwhatelse.pl
it is banned.
If your URL looks like:
www.stackoverflow.com/somwhatelse.pl
it is ok. The complete set of rules you'll find on http://www.robotstxt.org/. This is the way, if you can not install additional modules for any reason.
Better would be to use a module from cpan:
There is a great module on cpan that I use to deal with it: LWP::RobotUA. LWP (libwww) is imho the standard for webaccess in perl - and this module is part of it and ensures your behaviour is nice.
Hum, you don't seem to have even looked! On the first page of search results, I see various download engines that handle robots.txt automatically for you, and at least one that does exactly what you asked.
WWW::RobotRules skips rules "substring"
User-agent: *
Disallow: *anytext*
url http://example.com/some_anytext.html be passed (not banned)